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PREFACE TO REVISED EDITION 

Dtjeing the eight years since the first edition of this book 
appeared, there has been a remarkable development in the 
use of statistics and statistical metiiods. This has come about 
in part because of the need for quantitative data during and 
following the World "War, and also because of the growing 
appreciation that social, political, business, and economic 
policies should rest upon a factual basis. 

The development has taken a variety of forms. Statistics 
and statistical methods now constitute an important part of 
college and university instruction; banks, research agencies, 
and the government, particularly, publish statistics on a wide 
variety of topics relating to trade and industry, social and 
industrial progress, and business conditions. Moreover, the 
larger business firms now have their own statistical depart- 
ments in which they collect and interpret facts about their 
own affairs, and in which they use those collected by others. 
There is scarcely an economic or social issue which is not being 
treated statistically. A renaissance of interest in all phases 
of statistics seems to have captivated the business and social 
world. 

While this is gratifying, it raises two questions in which 
teachers of statistics and practicing statisticians are vitally in- 
terested: (1) What type of training is necessary in order to 
develop men and wmmen skilled in the preparation, use, and 
interpretation of statistics? and (2) How should the intro- 
ductory subject matter of statistics and statistical methods 
be presented? The writer, during the past fifteen years, has 
given the better part of his time and attention to a considera- 
tion of these and similar inquiries, and the revised edition 
of this book contains his answers. 
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viii PREFACE TO REVISED EDITION 

This edition, while retaining the distinctive features of the 
one which it supplants, records the progress which has been 
made in the technique and use of statistics since 1917. The 
subject matter is discussed in keeping with the well-estab- 
lished pedagogical principle that skill and judgment in the 
use of statistics can be best acquired when the methods are 
presented in the order in which they are used in statistical 
analysis. 

The book, it is hoped, is more than a ^^statistical arith- 
metic/^ or even a compendium of statistical practices. A 
conscious effort has been made to give it body and substance, 
and to state and illustrate the principles back of numerical 
calculation and manipulation. Mathematical formulae and de- 
scriptive methods of how to use statistics, while fully ex- 
plained, are discussed in connection with the logical place 
which they hold in scientific thinking. Statistical analysis, 
requiring as it does observation of facts, their measurement, 
suitable analysis, and logical inference is’ treated broadly and 
fundamentally. The book is concerned with the statistical 
ways in which each of the steps in constructive thinking should 
be carried out. It is intended to be an essay in applied logic. 
While designed as an introduction to the subject, it is broad 
enough in scope, it is believed, to supply the basis for a thor- 
ough understanding of the elementary principles of statistics 
and statistical methods. 

In the revision, the book has been entirely rewritten, en- 
larged, simplified, rearranged, more fully illustrated, and, it 
is hoped, the principles more accurately stated. Among the 
changes that have been made are the following: Chapters II 
and XI, in the old book, are now Chapters II and III, and X 
and XII, respectively. New chapters on The Theory of Prob- 
ability and some Properties of the Normal Law of Error Dis- 
tribution, and on The Treatment and Correlation of Time 
Series have been added. Those relating to The Principles of 
Index Number Making and Using, and American Index Num- 
bers Described and Compared, have been entirely recast and 
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given new positions in the order of treatment. Both the prin- 
ciples and methods of constructing index numbers of quanti- 
ties, prices, trade, general business conditions, etc., are fully 
discussed and illustrated. All of the chapters have been care- 
fully revised, and an Appendix added. The latter includes a 
table of Powers, Roots, and Reciprocals, and a table of Four- 
Place Common Logarithms. Indeed, in its present form the 
book may be called new. 

For suggestions and assistance in the revision, I am indebted 
to the students of Northwestern University who, during the 
past eight years, have constituted a laboratory in which the 
pedagogical problems of instruction in statistics and statistical 
methods have been observed; to instructors’ of statistics in 
other universities and to practicing statisticians with whom 
I have discussed the subject matter; to Professor A. L. Bowley 
of the London School of Economics and Political Science, who 
was kind enough to read in manuscript the first eight chapters, 
and to discus’s with me, personally and at length, the different 
phases of statistical methods; to Professor G. Udny Yule, 
Cambridge University, England, Professor D. Caradog Jones, 
University of Liverpool, and a number of others from whom 
I received valuable suggestions while studying in English uni- 
versities* the contents of courses in Statistics and the methods 
of instruction; to E. J. Moulton, Professor of Mathematics, 
Northwestern University, who read the revision in manuscript; 
and to Miss Blanche L. Altman, Lecturer in Statistics, North- 
western University, and Miss Gretchen Seibert, my Secretary, 
both of whom assisted in the laborious’ task of preparing the 
matter for publication and in seeing it through the press. 


June 1, 1925. 


Hoeace Seckist. 




PREFACE TO FIRST EDITION 

The following chapters are an attempt to work out an 
introductory, but at the same time a comprehensive, text 
on statistical methods for the use of college students and 
students in colleges of business’ administration. They are also 
intended to supply the need for a fundamental treatment of 
the methods of statistical investigation and interpretation. 

• Statistical methods are regarded as means rather than as ends, 
as constituting simply one phase of general methodology, and 
as’ including not only methods of analyzing but also of col- 
lecting and assembling statistical data. The methods dis- 
cussed are of general application although the illustrations, 
for the most part, are drawn from economic and business 
fields. 

The order of treatment is the same as that followed in the 
planning and analysis of a statistical problem, and it is hoped 
that statisticians, business executives, and students of statis- 
tical methods generally will find the volume not only a com- 
pendium of statistical procedure but also a guide in the process 
of logical statistical analysis. Emphasis is given to the neces- 
sity of a clear formulation of the problem in mind, to the 
meaning, collecting, and assembling of data, and to the neces- 
sity of a rigid interpretation and use of units of measurements. 
All of these steps are held to be preliminary but indispensable 
to the formulation of a statistical judgment, and to the em- 
ployment of the refinements of mathematical analysis which 
alone are too generally associated with ^^statistical methods/' 

The treatment is non-mathematical for several reasons, 
chief of which are, that the mathematical phases of the subject 
are treated in other places, and that there seems to be an 
urgent need for a fundamental discussion of the non-mathe- 
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matical, but not less vital, processes in statistical investigation 
and analysis. Experience in teaching statistics both to college 
students and business men, as well as in conducting statistical 
investigations, has demonstrated the need for such a treatment. 
It has been the aim at every stage of the discussion to develop 
the ‘Vhy” of statistics’, and concretely to relate methods to 
the problems of public and private economics. 

The bibliographical aids at the close of the several chapters 
are not meant to be inclusive, but are chosen because of their 
value to students and others as collateral reading. A discus- 
sion of certain of them along with the text treatment, and in 
the light of the laboratory problems assigned, has proved 
helpful in the author^s classes. 

I am indebted to Professor Willard E. Hotchkiss, formerly 
Dean of the Northwestern University School of Commerce, 
and to Professor John F. Hay ford, Dean of the Northwestern 
University College of Engineering, for reading parts of the 
manuscript and for offering many helpful suggestions’ for its 
improvement. Most of all I am indebted to my wife, who 
has materially lightened the burden- of proofreading, and 
who, at all stages in the preparation of the volume, has been 
a constant source of encouragement. 


November, 1917. 


HoBA.cs Sbcbist. 
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CHAPTER I 

THE MEANING AND APPLICATION OF STATISTICS 
AND STATISTICAL METHODS 

I. Introduction 

It is coming to be the rule to use statistics and to think 
statistically. The larger business units not only have their 
own statistical departments in which they collect and interpret 
facts about their own affairs, but they themselves are con- 
sumers of statistics collected by others. The trade press and 
government documents' are largely statistical in character, 
and this is necessarily so, since only by the use of statistics 
can th^‘ affairs of business and of state be intelligently con- i 
ducted. 

Business needs a record of its past history with respect to 
sales, costs, sources of materials, market facilities, etc. Its 
condition, thus reflected, is used to measure progress, financial 
standing, and economic growth. A record of business changes 
— of its rise and decline and of the sequence of forces influ- 
encing it — ^is necessary for estimating future developments. 
This necessity extends not only to matters affecting accounts 
and accounting, but also to sales, population growth, consumer- 
demand, transportation, sources’ of raw material, advertising 
and display, industrial accidents and liability, capital accumu- 
lation, income distribution, marketing possibilities, prices and 
price movements, credit and banking faciiilies, production, etc. 

Accounting alone does not meet this need. It is concerned 

1 
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primarily with recording debtor and creditor relations and 
financial transactions, and with balancing accounts. These 
are all necessary, but they are inadequate. They do not fully 
disclose the workings of all phases of business, nor do they 
cover all aspects of business with which management is con- 
cerned. Moreover, the method takes account more of indi- 
vidual than group transactions. It is concerned primarily with 
a summation of details into totals and with the distribution 
of accounts and financial transactions among the respective 
groups of which they are a part. It does not treat with aggre- 
gates as’ such nor with the averages which serve to character- 
ize them. It does not deal with the ^^law of large numbers’^ — 
statistical regularity — but rather with the detail out of which 
the aggregates are made up. Its technique and method are 
different from that which has come to be known as statistical 
methodology. How different will appear more clearly as that 
rating to statistics is developed in what follows. 

' "^Because it has become qecessary to base economic, business, 
and social policies upon facts; and because the collection, use, 
and interpretation of such facts require the knowledge of a 
special technique, instruction in statistical methods is neces- 
sary .^It is the main purpose of this* book to serve as an intro- 
duction to such methods. That this need is keenly felt is evi- 
dent from the fact that universities, almost without excep- 
tion, give statistics and statistical methods a place in their 
curricula; and that business firms, trade and industrial asso- 
ciations, government bureaus, and others actively compete for 
the services of those whose knowledge extends to this subject. 

While it is coming to be appreciated that a knowledge of 
facts and action based upon them are necessary as a basis for 
business and social policies, this point of view is not uni- 
versal, It is still common for business men to base their 
policies upon ^'hunches'' and hearsay. The same is true in 
other walks of life. Statesmen, legislators, and social workers 
sometimes scout ^^statistics,” and support their beliefs and 
programs on a less secure foundation. These arise in tradi- 
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tion, customary belief, and prejudice. On the whole, how- 
ever, respect for both statistics and statistical methods is 
deepening and broadening, '^hat is now being done is closelyl 
to observe conditions, to enumerate the frequency with which! 
they occur, to analyze the relations between them, and to gen-| 
eralize in the light of such observations. '^his is as it should! 
be. 

'^he study of statistics is largely concerned with methods — 
methods of collecting and utilizing ''numerical datd in order 
to understand economic, business, and social problems. Its 
aim is to reduce to a workable basis the methods of statis- 
tical analysis, to state the principles which govern such analy- 
'sis, and to illustrate the ways in which the methods may be 
applied to the affairs of life. It is essentially practical, yet 
is far more than vocational. Statistical methods, wherever 
applicable, are much alike. The fundamentals are the same 
wherever used; only in minor respects do the details differ. 
Their general application makes statistics a suitable subject 
for study. 

The following treatment, while primarily keeping in mind 
the needs and problems of the student and of the business man, 
is broad enough to serve as an introduction wherever statis- 
tics are used. It is assumed that the student is scientifically 
inclined, that he is without prejudice, and is open-minded. 
It is taken for granted that he wishes to understand the prob- 
lems with which he deals, to acquire a knowledge and an 
understanding of tl^e methods by which problems may he ap- 
proached statistically, and to acquire a certain amount of 
technique in dealing with them. It is also assumed that busi- 
ness men and others desire to act rationally upon the basis 
of facts, and to formulate their judgments in the light of their 
proper interpretation. 

The statistical approach to the study of the facts of life, 
however, does not preclude the use of other methods. Indeed, 
with respect to some, it has no application. Some phenomena 
cannot be quantitatively measured. Honesty of purpose, re- 
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soixrcefulness, integrity, good-will — all important in industry 
as' well as in life generally — are not susceptible of direct sta- 
tistical measurement. Where it is applicable, there is often 
too much faith placed in statistics alone. Statistics are used 
as '^proof,^^ when a^ matter of fact little or nothing can be 

I ^^proved” by them. What can be done by them is to describe 
problems quantitatively, break them up into their different 
parts, summarize the facts about them, and prepare the way 
for a logical inference.^ The latter, however, must be made 
in part on other than statistical evidence. 

^-^hile statistics do not supply conclusions, they do furnish 
^in part the basis on which they may be drawn. When ^%ta- 
tistics^’ are available, however, reason is frequently dispensed ^ 
withj/ Indeed, reasoning is sometimes thought to be equiva- 
lent to citing “statistics.^' The two, however, are not identical. 
Statistics are sometimes quoted as “proof," notwithstanding 
the fact that they may (1) have no application to the prob- 
lem being considered, (2) be incomplete, and (3) be unrepre- 
sentative and questionable in origin. Obviously, this con- 
dition obtains when ignorance holds sway, or when design 
prompts one to confuse his opponent by quoting what appears 
to be irrefu]table “statistics." Moreover, not all problems can 
be measured in statistical terms, nor conclusions about them 
be reached by the use of statistical methods. Loose reasoning 
and faulty judgments, of course, are never defensible, but 
the^e is less excuse for them when statistics are used as “proof" 
than when they are ignored. This follows because statistics 
seem to be exact — the mere fact that they are caressed as 
definite quantities makes them appear precise. Appearance 
in this form, however, is a guaranty neither of accuracy nor 
of application. 

The significant thing about statistics is not so much th^ 
numerical quantities which are attached to things counted M 
’4 is the identity of the things themselves. Indeed, the same 
quantitative difference does not necessarily have the same 
significance. For instance, the difference between 6 and 7 
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is 1. The difference between 246,789 and 246,790 also is Ij 
but it is not necessarily the same 1. It is certainly nol 
the same proportional difference. The first may be real; the 
second is probably fictitious. Only as quantities are they 
alike; in significance they may be entirely different. 

The facts of business, economic, and social life which are 
expressed statistically are traceable to a multitude of causes. 
Rarely do they stand alone as isolated occurrences. They 
are related to other facts. They occur in sequences with re- 
spect to time, space, or condition at a given time or space. 

''A given economic fact is the result of numerous complex forces, 
jnaiiy of which are in a state of constant variation and react upon 
one another; and of these forces only a few can be adequately de- 
scribed by the method of statistics. Consequently these few are 
often quoted as if they were the only active causes whereas the 
effect attributed to them is probable only on the assumption that 
all other causes remain unchanged or suspended. . . . Statistics, 
even when compiled accurately, though often absolutely necessary 
for a complete solution of a problem, do not in themselves provide 
that solution, but are to be used in conjunction with evidences of 
other kinds.” ^ 

The important steps involved in the use of statfetics are: 
(1) observation, (2) measurement, (3) analysis, and (4) in- 
ference. It is the multitude of processes and methods con- 
nected with each of these steps with which this book is 
concerned. Because they are misunderstood or ignorantly 
carried out, statistics are often in disrepute. The reason for 
this, of course, cannot lie with the statistics. They are but 
tools in the possession of the ^'statistician.” Like other 
"weapons of defense,” they may be abused or misused. By 
themselves, they carry no significance. False conclusions are 
ap^easily supported by the use of statistics a® are those which 
are true. One does not have to search widely for illustrations 

*McIlraitli, James W., The Oouree of Prices m N&w Ooveru- 

ment Printing OflElce, Wellington, New Zealand, 1911, p. 4 of Xntradtioikm 
by J- Higbt. 
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of this fact. For instance, in the hands of one, they are used 
to ^^prove” that railroad rates are too high; in those of an- 
other, that they are too low. As used by one, they seem to 
support the contention that wages have advanced; in those of 
another, that they have declined. 

To what conditions are these different conclusions due? 
Motive in some instances; ignorance, in others. More often, 
however, they result because the following among other fun- 
damental rules in the use of statistics are ignored: 

^'Never have preconceived ideas as to what the figures are to 
prove. 

'^Never reject a number that seems contrary to what you might 
expect, merely because it departs a good deal from the apparent' 
average. 

''Be careful to weigh and record all the possible causes of an 
event, and do riot attribute to one what is really the result of a 
combination of several. 

"Never compare data which have nothing in common.” ^ 

It is not our purpose at this* place in the discussion to supply 
a set of rules for the use of statistics. As the treatment pro- 
ceeds, this will be done in connection with the different topics 
discussed. It is, however, of interest to sketch briefly certain 
clearly marked tendencies by which beginners in the use of 
statistics and consumers of statistics are affected. Attention 
should be called to them in passing. 

(1) The tendency to accept and to use without question 
any available "statistics.” They are freely quotedfirand cited 
at length when other methods fail. Ipse dixit ist-often re- 
garded as sufficient proof. The mere fact that statics are 
in print and appear in tabulated or graphic form — ^the i^ality 
of a statistical table, diagram or graph is often magical — 
serves to give them sufficient sanction. Of course, they inay 
be inappropriate for the use to which they are put, and yet 
they are ^^statistics.” Why not quote them when they are 

^Newsholme, Arthur, The Elements of Vital London, 1892, 

M Ed., pp. 292-293. 
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available, and when to the unsuspecting they carry profound 
weight? Illustrations of such tendencies are common. One 
has only to recall popular addresses, to consult the daily press, 
and to observe student reports in order to find examples of 
this practice. Teachers observe a kindred tendency in stu- 
dents to cite the statements from their textbooks as irrefut- 
able proof. It is one part of the teacher^s task to correct, and 
one portion of the student^s training to overcome this ten- 
dency. 

(2) The tendency to concentrate attention on statistical 
quantities or frequencies and to ignore the units’ in which they 
are measured. The same things or conditions are rarely 
< 40 unted for any length of time. Neither are the same units 
of measurement generally used at different places. The uses 
which statistics are intended to serve change from period to 
period. As a consequence, units of measurement also change. 
Moreover, different policies prompt statistical organizations 
at the same time but at different places to use different units, 
to interpret them in different ways, and to insist upon differ- 
ent standards of accuracy and completeness’. These facts are 
frequently forgotten. But they ought not to be. 

(3) The failure to remember that statistical compilations 
are generally made for definite purposes and that they can- 
not be used with the same precision for other purposes. 

(4) The tendency to ignore the fact that statistics are in 
a very real sense personal. By this is meant the fact that 
some person or organization is responsible for them — ^that 
upon someone has been placed the responsibility of setting 
up the standards according to which they were collected, of 
determining upon the amount of error which would be toler- 
ated, of mapping out the field from which they should be 
drawn, and of deciding upon the subjects to which they apply. 
But the personnel and policies of statistical organizations 
change, and with them also the continuity of statistical series. 

(6) The tendencies to disregard detail — or to regard it as 
“detail” which somehow will take care of itself and needs no 
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especial attention; to ignore statistical cautions respecting 
the collection of data or the use of those already collected; 
to ^peak in terms of statistical abbreviations, averages of all 
types; to employ totals as if they were always more accurate 
than the items which go to .make them up; and to piece to- 
gether statistical fragments, gleaned from widely different 
sources and compiled under widely different circumstances 
and conditions.^ 

But to call attention to these tendencies is not sufficient to 
correct them. More is necessary. Students need to be shown 
the consequences to which they lead. Moreover, they must be 
instructed in what the scientific uses of statistics consist. It 
is one of the purposes of this volume to put the reader in 
possession of the information, tools, and knowledge whereby 
he can use and interpret statistics intelligently. Moreover, it 
is intended to supply information which will help him to pass 
upon the merits of the statistical approach to economic, social, 
and business problems, and to undertake statistical studies 
independently. 


^ For an admirable discussion of tlie false uses to whicli statistical data 
will b<6 put, even by those vrho are in a iwsition to know their limits, 
when it is a question of making a case, see Bowley, A. L., “Statistical 
Methods and the Fiscal Controversy” in The Bconomic Journal^ London, 
Vol. 13, 1003, pp. 303-313. In formulating the rules to be observed, 
Bowley says: 

“Every statistical estimate should be considered in the light given by 
corresponding estimates for previous years. 

“Every total should be homogeneous in that quality which concerns 
the argument. 

“Where values are used, the effect of replacing them by quantities 
should be tested. 

“The errors latent in the constituents which form an estimate should 
be examined, and their effect on the estimates should be tested with 
reference to the purpose for which the estimate is used. The maximum 
adverse errors should be calculated, to see if their concurrence would 
vitiate the result. 

“The ideal measurement necessary to support each deduction should 
be conceived ; and if the estimates accessible do not necessarily give the 
same view as the ideal measurement, they should be rejected. 

“When the sufficiency of statistics as estimates is established, the argu- 
ments based on them should be bound to the statistical results by the 
ordinary rules of logic.” Ibid., p. 812. 
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II. Tjeie Meaning of Statistics and Statistical Methods 

Statistics are generally thought of from two points of 
view: first, as series of numerical facts; and second, as 
methods which have to do with the collection, classification, 
tabulation, summation, abbreviation, and comparison of such 
facts for the purpose of describing or explaining the phenomena 
with which they deal. The first point of view is concerned 
with the finished product — ^the facts themselves; the second, 
with the preparation of the raw material and with the use 
of the finished product. 

The two ways of looking at the subject are complementary. 
To secure the final product — ^statistics^ — ^requires the use of 
methods. These are concerned primarily with the technique 
of collection — enumeration and estimation — and with summa- 
tion and abbreviation. The use of statistics — statistical meth- 
ods — closely approaches logic, concerned as it is with the 
processes and methods of formulating and testing conclu- 
sions from premises which rest solely upon statistics. The con- 
ditions which determine what shall be enumerated; the units 
which shall be used; the accuracy, completeness, and consis- 
tency which shall be insisted upon, etc., largely determine the 
methods to be used in analysis. It is an error to think of the 
two viewpoints as unrelated. They are intimately connected. 
The adequacy of a tool, or the perfection of a machine — ^to 
speak analogously — ^is quite as important in the determination 
of a product as is the way in which it is used. Of course, 
skillful use may in part compensate for a poor tool, as skill- 
ful discrimination in the use of statistics may tend to cor- 
rect ^ors following from crude or defective enumeration or 
estimation. An^ap gurate s tatisti cal conclusion may som e- 
times be reached by the u^ of inaccurate data. But such is 
not the rule. Statistics, as methods, are as much concerned 
with the preparation of the final product — ^atistics — as with 
their use. In what follows, the principles of methodology are 
extended to both phases of the subject. 
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In definitions of statistics the emphasis has been variously 
placed. Bowley has called statistics the ^^science of aver- 
ages'^^ as well as’ ^The science of counting,"^ The first defini- 
tion emphasizes one device for statistical abbreviation; the 
other calls attention to the enumeration which precedes analy- 
sis. In another place, Bowely defines statistics as “numerical 
statements of facts in any department of inquiry, placed in 
relation to each other," and statistical methods as’ “devices for 
abbreviating and classifying the statements and making clear 
the relations." ^ Yule defines statistics as “quantitative data 
affected to a marked extent by a multiplicity of causes" and 
statistical methods as “methods specially adapted to the elu- 
cidation of quantitative data affected by a multiplicity of 
causes."^ Pearl defines statistics as “that branch of science 
which deals with the frequency of occurrence of different kinds 
of things or with the frequency of occurrence of different attri- 
butes of things." ^ Still others, using the terms with less’ pre- 
cision, and in a less scientific sense, have sought to identify 
statistics with graphic methods^ — to convert the science into 
an art. 

We shall use the term statistics as meaning aggregates of 
facts, ^^affected to a marked extent by a multiplicity of 
causes” numerically expressed^ enumerated^ or estimated ac- 
cording to reasonable standards of accurcucy, collected in a 
systematic manner for a predetermined purpose, and placed 
in relation to each other. 

This definition needs to be explained. Statistics are always 
aggregates: that is, they are made up of a number of cases. 
Isolated facts are not statistics: they may be the instances 

^Bowley, A L., Elements of Statistics, P. S King, London, 4t1i Ed., 
1920, p. 7. 

> lUd., p, 3. 

® Bowley, A. L., Elementary Manual of EtaUstioSf MacDonald & Evans, 
London, 1915, p. 1. 

*Yule, G. U., An Introduction to the Theory of Btatistics, Griffin & 
Company, London, 1911, p. 5. 

® Pearl, Baymond, Introduction to Medical Biometry and Statistics, 
W. B. Saunders Company, Philadelphia, 1923, p. 19. 
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which make up statistics, provided they relate to the same 
thing over a period of time, to different attributes of things, 
or to the same thing at different places or times, A single 
death, an accident, a sale, a shipment does not constitute 
statistics. Yet numbers of deaths, accidents, sales, and ship- 
ments are statistics. Why? Because they are aggregates' 
which may be analyzed: that is, studied in relation to time, 
place, and frequency of occurrence. 

Moreover, statistics are ^^affected to a marked extent by a 
multiplicity of causes.^’ They refer to measurements of phe- 
nomena in a complex universe. They are related to other 
measurements. They grow out of a variety of circumstances, 
differing among themselves, and are constantly subject to 
change. None of them are traceable to a single cause. 

StatisticSj mQreoveTj are numerically expressed. Quantities 
not qualities are dealt with. Differences are shown by num- 
ber. For instance, crops over a series of years, expressed in 
bushels harvested per acre, are statistics. The same facts indi- 
cated by-such expressions as ^^good,” Affair, “medium,^^ “poor,'^ 
etc., are not statistics unless a numerical equivalent is as- 
signed to each qualitative expression. 

StatisticSj if they are to serve as the basis for a logical 
conclusion, and are to be combined, averaged, and sum- 
marized, mmt be^ enumerated or estimated according to rea-- 
sonable standards of accuracy. Moreover, the same standards 
must obtain throughout the whole process of collection. What 
standards are ‘^reasonable” depends upon the purpose which * 
the statistics are to serve. No absolute criterion can be 
established for all cases. Where precision is required, ac- 
curacy is necessary; where general impressions are sufficient, 
appreciable error may be tolerated. 

Then, too, if quantitative measurements are truly to be 
called ‘‘statistics,” they must be made in a systematic manner 
in keeping with a given purpose. The purpose for which things 
are counted, or measurements and estimates made, will always 
determine the standards followed. If the purpose changes, 
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quantities may still be secured, but they refer to different 
things, or to the same thing in different ways, or to different 
degrees. They cannot be treated statistically and become the 
basis for valid conclusions. 

For quantities to be called statistics, moreover, they mmt 
be capable of being placed in relation to each other. This 
may be done in point of time, of place, or of condition. That 
is, the term suggests comparison, and in order for things to 
be compared, they must have qualities in common. Indeed, 
as Bowley says, '^Like can only be compared with like/’^ 
Stray and loose bits of quantitative information, hearsay, and 
unrelated material, gleaned here and there from indiscriminate 
sources, having no common basis of selection, while numerical, 
can be termed statistics only by a confusion of terms. If they 
are aggregates, homogeneous in the qualities necessary for 
comparison, then they may be called statistics, but not other- 
wise. 

So much for the definition of statistics. But the term is 
used in another sense. It is sometimes spoken of as a science. 
In this usage, it refers to a method or to methods of dealing 
with the frequencies with which different things, or different 
attributes or characteristics of things occur. In some cases, 
it is spoken of as a method; ^ in others, as methods. We shall 
use the term in the plural. 

Statistical methods include all those devices of analysis 
and synthesis by means of which statistics are scientifically 
collected and used to explain or describe phenomena either in 
their individual or related capacities, 

* Bowley, A. Lr,, “The Improvement of Official Statistics/’ in the 
JoumcU of the Uoyal Statistical Society j September, 1908, Vol. 71, p. 
467. 

This article is reprinted in the author^s Readings and Prohlems in 
Statistical Methods, Macmillan & Company, New York, 1920, pp. 150-159. 

*“The statistical method is that which deals with assemblages, or 
groups, in terms of the averages by which they may be described, and 
deals with relations which are not described by unchanging laws but 
by generalizations couched in terms of approximations and of probabil- 
ity.” Mills, Frederick 0., “On Measurement in Economics,” in Th^ 
Trend of Economics, Knopf, New York, 1924, pp. 38-39* 
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These methods have to do with the processes of (1) select- 
ing and collecting data, (2) classifying them according to 
their common characteristics, (3) recording and illustrating 
the instances in keeping with a scheme of classification, (4) 
summarizing or abbreviating the detail by the use of averages, 
and (5) measuring the relationship which obtains between 
them. These are the methods with which the remaining part 
of this volume is concerned. 

A^IIL The Use and Application of Statistics 
AND Statistical Methods 

* Statistics are now collected on most important business and 
social problems'. Indeed, we are surfeited with statistics. 
Some of them will satisfy our definition; others will not. This 
does not mean, however, that there is no dearth of statistics. 
There is. On many problems we have no adequate d ata. 
There is an abundance in some fields, and a scarcity in others. 
This condition is due to the growing need for information, 
part of which cannot be collected until plans have been devel- 
oped. It is also due to the overlapping jurisdictions and con- 
flicting purposes of public and private statistical organiza- 
tions. 'IMoreover, private purposes and transient needs prompt 
collections to be made, the series being discontinued as soon as 
the need is met, or changed in scope and meaning as soon as 
the purpose is served. «^The production of statistics is in a 
chaotic state; their use is hardly less haphazard. 

But progress is being made. This extends first to their use. 
They are being employed, and this fact is significant. Dis- 
criminating use will come with an appreciation of their mean- 
ing to trade, industry, and the state, and with the development 
of skilled workers who know how to employ them. Second, 
progress is also being made in standardizing the methods of 
collection and presentation. Government departments’, learned 
societies^such as The American Statistical Association^ — re- 
search organizations, etc., are all co-operating to improve and 
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extend not only the types of data collected but also to develop 
a technique of methodology in their use. The prospects are 
encouraging because statistical method is a ^'working tool of 
science. It is probably of wider utility than any other single 
tool which science has discovered or devised. For it has an 
applicability and a usefulness, direct or indirect, in virtually 
every problem. It is, in short, a fundamental element of sci- 
entific methodology. 

And yet, it is but one method. There are others which are 
often helpful in the explanation of phenomena. It has its limi- 
tations. It takes account only of quantitative and not of quali- 
tative differences. It is not of universal use or validity. Yet 
when other methods are employed, statistics may often be 
used in a corroborative way. Indeed, it is in this respect that 
they probably have their greatest value. 

This, however, does not mean that the function of statis- 
tics is limited to particular kinds of questions There are few 
problems’ relating to business, social policy, or statecraft for 
an understanding of which statistics are not required. There 
is need everywhere for an appreciation, measurement, and 
analysis of facts in their quantitative aspects, for the ability 
accurately to observe the conditions to which they are trace- 
able, for a determination logically and scientifically to piece 
them together, so that from them conclusions can be drawn 
which will become the basis for a program looking toward 
economic and social progress. 

The fields of application of statistics and statistical methods, 
even to problems of economics and business’ alone, are too 
broad and varied to be described at this place. Some of them 
have already been mentioned. It may be helpful, however, to 
enumerate the types of problems which may be statistically 
studied. The subsequent discussion and illustrations will serve 
more definitely to develop the precise manner in which they 
may be and are being studied. 


* Pearl, Raymond, op. p. 21. 
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1. Application to Individual Business Units. A study of: 

(1) Prices. 

(2) Production by departments, processes. 

(3) Sales and sales possibilities by districts, by periods, by prod- 

ucts. 

(4) Employment, as to rapidity of turnover, scale of wages, 

labor supply, types of welfare work. 

(5) Factory organization and stock control. 

(6) Margins on different goods. 

(7) Costs; results of management policies; avenues of distribu- 

tion; advertising methods and results; layout; price pol- 
icies; trade practices; consumer-demand; credit risks; 
size, frequency, etc., of customer-purchases. 

(8) Profits — ^gross and net — ^by periods, by departments, by prod- 

« ucts. 

2. Application to Groups of Business Units. Studies of this char- 

acter might extend, among other thmgs, to comparisons of: 

(1) Production. These would include: 

a. amounts and proportions of land, labor, and capital. 

b. expenses incurred and their distribution 

c. materials used — sources, amounts, costs, shipments, stor- 

age, inventories, purchases 

d. output — amounts, types, costs, distribution. 

(2) Finances: 

a. prices. ^ 

b. capital requirements, source, kinds. 

c. relation of current assets to current liabilities. 

(3) Expenses: 

a. overhead, current, selling. 

b. relation of each expense to sales and to total expenses. 

(4) Margins: 

a. on different goods. 

b. m relation to sales. 

(5) Turnover of 

a. merchandise, by lines. 

b. capital. 

c. accounts receivable. 

d. inventories. 

(6) Profits — ^gross and net. Relation to 

a. total capital. 

b, sales. 


c net worth. 
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3. Application to Matters of General Business Growth, Decline and 

Change. Under this head fall such topics as the following: 

(1) Production. 

a production — ^value, quantities, and grades. 

b. stocks of goods — sight, and potentially available. 

c. shipments. 

d. consumption. 

(2) Prices, money, and credit. 

a. banking actmty—loans, discounts, debits, clearings. 

b. credit — interest rates, security issues and prices. 

c. security markets. 

(3) Labor supply and compensation. 

a, employment and unemployment. 

b, immigration, emigration, labor turnover, wage rates. 

(4) Economic waste of 

a. materials. 

b. human resources. 

c. transportation. 

(5) Characteristic features and sequence of economic factors 

during periods of 

a. prosperity. 

b. liquidation 

c. stagnation. 

d. recovery. 

4. Application to Questions of Social Economy. 

(1) Poverty, crime, dependency. 

(2) Consumption of goods and spending of incomes. 

(3) Growth, decline, and movements of population. 

(4) Mortality, sickness, accidents. 

(5) Occupational distribution and adjustments. 

(6) Farm and home ownership, tenancy. 

(7) Distribution of wealth and income. 

(8) Conservation of natural resources. 

(9) Methods of wholesale and retail distribution. 

(10) Public expenditures, debt, taxes. 

5. Application to Affairs Pertaining to Governmental Discrimination 

and Policy. 

(1) The determination of the benevolent or malevolent effects 
of given state policies, such as those pertaining to tariff, 
use of natural resources, price fixing, public ownership and 
control. 
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(2) The determination of ^'fair values'" and ^treasonable returns" 

as bases for the exercise of administrative discrimination 
and the shaping of governmental policy. 

(3) The supervision of private business methods, looking toward 

the insuring of competition, the regulation of monopoly, 
the guaranteeing of favorable conditions of employment. 

(4) The evaluation of properties as a basis for taxation, con- 

demnation, and forced sale. 

(5) The recording of domestic and foreign trade movements, 

estimating national wealth and its distribution, recording 

j national progress so far as revealed statistically. 

6. Application to Questions of Economic Theory. 

The science of economics is becoming statistical in its 
method.^ The advice of Richard Jones to ^Xook and see" is 
being taken literally. Accordingly, in the study of the law 
of demand, for instance, recourse is being made to statistics 
of markets where demand is indicated in the prices paid and 
amounts purchased. Similarly, supply is studied with respect 
to costs, these being measured in standard units. Market 
analyses and cost studies are now becoming commonplaces, 
albeit that they are for the most part undertaken only by the 
larger business units, and are far too often unscientifically 
carried out. The significant thing is that they are being 
made. Improvement will come in time. Just as fast as busi- 
ness men, singly or in groups, come to realize that there are 
basic principles which lie behind the daily routine of pricing, 
producing, and selling, for instance, which may be discovered 
and stated, just so fast will they seek for and be guided by 
such principles. 

Jevons, in 1871, stated the problem clearly. He said, 
know not when we shall have a perfect system of statistics, 
but the want of it is the only insuperable obstacle in the way 
of making Economics an exact science." ^ Keynes says that 

^Tugwell (Editor), The Trend of Economics^ Alfred Knopf, New York, 
1924, Chapters I and II, pp. 3-34, and 37-70, respectively. 

® Jevons, W. Stanley, The Theory of PoUUeal Economy, Macmillan & 
Company, New York, 4th Ed , 1911, p. 12. 
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the function of statistics is ‘^first, to suggest empirical laws, 
which may or may not be capable of subsequent deductive 
explanation; and secondly, to supplement deductive reasoning 
by checking its results, and submitting them to the test of ex- 
perience.’^ ^ Professor Moore’s Laws of Wages is an excellent 
example of tlie use of statistics and statistical methods in the 
development of economic theory. Stating .his purpose, he says, 
have endeavored to use the newer statistical methods and 
the more recent economic theory to extract, from data relat- 
ing to wages, either new truth or else truth in such new form 
as will admit of its being brought into fruitful relation with 
the generalizations of economic science.”^ 

The use of statistics and statistical methods for these pur-^ 
poses, while possessing great possibilities in the hands of the 
well-trained statistical economist, offers few opportunities to 
the readers to whom this volume is addressed.® 

^ Keynes, J. N., Scope and Method of Political Bconom^, 2d Ed., re- 
vised, Macmillan & Co., London, 1807, p. 838. 

* Moore, H. L»., Laws of Wages, Macmillan & Company, New York, 

1911, p. a 

*It may be of general interest to list some of the economic subje(*tH 
with respect to which statistics have been used to discover “laws^" or 
tendencies. Among these are the following : the business cycle, com- 
petition, consumption, distribution of wealth and income, population 
growth, prices, production, rents, trade, unemployment, wages, etc. TInu’e 
is an extensive literature pertaining to these subjects. Those who are 
interested may consult the following among other writings : 

ON THE BUSINESS CYCLE 

Hansen, Alvin H., Cycles of Prosperity and Depression in the United 
States, Great Britain, and Germany — A Study of Monthly Data, 
1902-1908, Madison, Wisconsin, 1921. 

Business Cycles and Unemployment, McGraw-Hill, New York, 1923. 
Mitchell, Wesley C., Business Cycles, Univ. of California, Berkeley, 
1913. 

Moore, H, L., Economic Cycles, Their Lau> and Cause, Macmillan 
Company, New York, 1914. 

Moore, H. L., Generating Economic Cycles, Macmillan & Company, 
New York, 1923. 

Moore, H. L., Forecasting the Tield and the Price of Cotton, Macmillan 
& Company, New York, 1917. * 

Persons, W. M., “The Construction of a Business Barometer based upon 
Annual Hata’'^ in American Economic Beview, December, 1916, pp. 
739-769. 
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With this introduction, the purpose of which is to open up 
the subject, to define its boundaries, and to suggest the nature 
of the uses of statistics and statistical methods, we pass im- 
mediately, in Chapter II, to a consideration of Types of Sec- 
ondary Statistical Data and Tests for their Use. 

(Note 3 continued) 

Persons, Poster and Hettinger (Editors),, The ProUem of Business 
Forecmting, Houghton Mifflin, Boston, 1924, passim. 

Review of Economic Statistics^ Harvard Economic Service, Cambridge, 
Mass., especially the numbers for January and April, 1919: July, 
1923 ; January, 1924. 

ON COMPETITION, COSTS, DEMAND, AND PROFITS 
Schultz, Henry, “The Statistical Measurement of the Elasticity of 
Demand for Beef,” Journal of Farm Economics, July, 1924, pp. 
254-278. 

Secrist, Horace, “Competition in the Retail Distribution of Clothing — 
A Study of Expense or ^Supply’ Curves,” Bureau of Busimss Re- 
search, 'Northwestern University, Chicago, 1923. 

Secrist, Horace, “Expense Levels in Retailing — a Study of the ‘Repre- 
sentative Firm* and of ‘Bulk-Line’ Costs in the Distribution of Cloth- 
ing,” Bureau of Business Research, Northwestern University, Chi- 
cago, 1924. 

Simpson, Kemper, “A Statistical Analysis of the Relation between Cost 
and Price,” Quarterly Journal of Economics, 1921, pp. 264-287. 
Simpson, Kemper, “Further Evidence on the Relation between Price, 
Cost, and Profit,” Quarterly Journal of Economics, February, 1923, 
pp. 476-490. 

Taussig, F. W., “Price Fixing as Seen by a Price Fixer,” Quarterly 
Journal of Economics, February, 1919, pp. 205-241. 

Wright, Philip G., “Value Theories Applied to the' Sugar Industry,” 
Quarterly Journal of Economics, November, 1917, pp. 101-121. 
Wright, Philip G., Sugar in Relation to the Tariff, McGraw-Hill New 
York, 1924, pp. 106-130; 276-284. 

on consumption 

Ogb'TON, W. F., “Analysis of the Standard of Living in the District of 
Columbia in 1916,” in Quarterly Publications of the American Sta- 
tistical Association, June, 1919, pp. 374-392. 

on distribution of wealth and income 
Income in the Umted States — Its Amount and Distri'bution, 1909-1919. 
National Bureau of Economic Research, New York, Vol I 1921* 
Vol. II, 1922. * ’ 

ON population growth 

Peae^, Raymond, and Reed, Lowell, J., Predicted Growth of the Popu- 
lation of New York and Us Environs, New York, 192k 

on pi&ces 

Fishes, IsviNa, The Purchasing Power of Money, Macmillan & Com- 
pany, New York, 1911. «• ui 
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CHAPTER II 


TYPES OF SECONDARY STATISTICAL DATA AND 
TESTS FOR THEIR USE 

I. Intkoduction 

For statistics to be used, they must be available. Indeed, 
the way in which they are used is determined by the condi-^ 
tions which have been or may be followed in collecting or as- 
sembling them. Statistics do not come into being of and by 
themselves. They are not collected without a purpose. Those 
which are now available were originally intended to serve 
some end, notwithstanding the fact that it may not be ap- 
parent to the user and may be foreign to the needs of a particu- 
lar time, place, or condition. This must not be forgotten. 
Likewise, those which are in process of collection, or are to be 
^^llected, will be chosen because of their suitability to a defin- 
ite purpose. 

At any given time or place, or under any condition, the col- 
lection of statistical data presupposes certain standards of 
accuracy, completeness, and comparability. What these are 
for any group of data depend upon (1) the purpose in mind, 
(2) the charact^ of the data themselves, (3) the bases of 
selection and omission, (4) the integrity, honesty, and organiza- 
tion of the collectifig body, (5) the basis of classification used 
in groiiihg them, (6) the clerical accuracy used in their com- 
pilation, and (7) th^ adhffence to uniform units or terms in 
which the quantities are expressed. 

Statistics are found either as a ^^finished product” or as '^raw 
material” In the first form, they appear in the trade press, 
government documents, newspapers, annual reports of banks, 

22 
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corporations, etc.; in the second form, in the transactions of 
business, the processes of industry, movements of population, 
etc. They are constantly in a state of ^^manufacture.” The 
finished product of to-day is the raw material of yesterday. 

This chapter has to do: first, with a brief description of 
the chief sources of secondary statistical data, that is, with 
those already available; and second, with the tests which 
should be applied to such data before they are used. 

It is not our intention to furnish an exhaustive list of the* 
different types of secondary statistical data, nor to indicate 
all of the places where they may be found. Neither shall we 
attempt to give a complete list of the private and public 
Organizations which collect such data. The present output 
of statistics is enormous. It applies to a vast and constantly 
changing number of subjects, and is of different value at the 
same time at different places, and at different times at the 
same place. To write a critique of secondary data would be 
an extremely diflScult if not impossible task. Moreover, it 
would be of little permanent value, since the methods which 
govern their collection change from time to time in the light 
of the particular needs and standards of the bodies respour 
sible for them. This much, however, may be said: the valu^ 
of the output is improving; statistical organizations, both 
public and private, are being placed on {a substantial and 
pmnanent basis; and statistical data, because of the use to 
which they are put, are being subjected to critical tests. * Th^se 
have to do, among other things, with completeness, accuracy, 
and uniformity. But more concerning them presently. 

Statistics, as indicated above, are numerical aggregates hav- 
ing certain well-defined properties. They are syntheses ^ made 

* “WTien we are investigating the nattire and causes of things and 
events in the natural and social sciences, we are face to face with facts. 
In statistics about those events we are brought face to face with 
theses. The statistician must regard his figures as a sort of symbol, 
whose character and signifiicat^e are more or less enigmatic; and he 
dfilfeently seek out all the probable causes of the facts he has 
symbolized before him, with a view to their scjmntific explanation.’^ P. 
Coffey. The Science of Logic, Longmans, Loiwwn, 1912, Vol. II, p. 287. 
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up of individual instances. Moreover, they are derivative in 
the sense that they numerically measure phenomena as they 
appear to an observer. The identity of the parts of even the 
simplest statistical aggregate must be established. Identifica- 
tion requires that ^^earmarks’^ shall be distinguished, and that 
they shall always appeal in the same way to those who are 
responsible for making the selection. To count such simple 
things as bushels of wheat, for instance, appears to be easy. 
Yet it is not always clear what is meant by a “bushel,” nor 
what is included in the term “wheat.” ^ Similar observations 
may be made about any statistical data. The important points 
to be considered are: (1) what are coimted, (2) are the same 
things always included, (3) who did the counting, and (4) for 
what purpose was the counting made? These topics need to 
be more fully considered. The discussion for the moment, 
however, has to do with the distinction between primary and 
secondary data. It will later include (1) the chief sources of 
secondary data, and (2) the tests to which such data should 
always be subjected before they are used. 

II. Primaky and Secondary Data Defined and 
Contrasted 

It is necessary to define, more accurately than has been 
done above, what is meant by secondary data. By “secondary 
data” are meant those which have been collected, tabulatt^d, 
and presented in simple or complex form for any purpose 
whatsoever. They generally appear as totals or percentages, 
removed one or more steps from the form in which they were 
reported. Consequently, they do not show on their face {!) 
the peculiarities of the units’ employed, (2) the purpose or pur- 
poses for which collected and used, (3) the way in which they 
have been edited, combined, and grouped, nor (4) the adjust- 

* See tlie iateresting study by Boerner, B. G., “Improved Apparatus for 
Determining the Test Weight of Grain, with a Standard Method of 
Making the Test,** Bulletin No, 472, V, Sf. Department of Agrimlture, 
October, 1916. 



SECONDARY DATA AND TESTS FOR THEIR USE 25 

ments which have been made in the original data in order 
that they might be used for the purpose in mind. They are 
truly “secondary.” They have been carried through certain 
manipulations, the extent and character of which are not gen- 
erally disclosed. 

In contrast with such data are those which are called 
primary. By “primary data” are meant those which are 
original: that is, those in which little or no grouping has been 
made, the instances being recorded or itemized as' encountered. 
They are essentially raw material. They may be combined, 
totaled, and averaged; but they have not extensively been so 
treated. 

' Of course, the distinction between primary and secondary 
data is largely one of degree. Data which are secondary in 
the hands of one party may be primary in the hands of an- 
other. Illustrations will make this clear. To the Federal Re- 
serve Bank of Chicago, for instance, the reported debits to 
individual accounts of the member banks are primary data. 
To one reading the report of the bank showing the total debits 
for tlie district, they are secondary. To the general public, 
the death rates published by the Board of Health of Chicago 
constitute secondary data. In the hands of the statistician of 
this Board, they are primary data. Moreover, to the Bureau 
of Business Research, Northwestern University, the records of 
sales,, expenses, inventories, etc., secured from the books of re- 
tail n^t establishments, are primary data. When these same 
facts are published by the Bureau, in interpretive studies, they 
become secondary. Wherein lies the distinction? Essentially, 
in the fact that the data before publication have been edited 
for completeness, accuracy, comparability, consistency; they 
have been combined into groups, averaged, summarized, ex- 
pressed as percentages, etc. They have been “worked over” 
for a purpose; they have lost the individual characteristics 
which they possessed as primary data when reported. 

But even so-called “primary data” are in reality secondary 
to the degree to which they have been “worked over” in the 
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process of gathering. While the distinction between the two 
is largely one of degree, it is none the less important. It is 
significant because the more secondary data become^ the more 
specialized is their function, and the more difficult is it to use 
them for purposes other than those for which they have already 
been used. Each successive use is made for a purpose, and 
carries with it new and different bases for combinations, ad- 
justments, omissions, etc. 

IIL SoxiECES OP Secondary Statistical Data 

The chief sources of secondary statistical data are the 
periodic and occasional reports' of (1) national, state, and 
city departments, bureaus, and commissions, (2) trade asso- 
ciations and private organizations, (3) research agencies, (4) 
technical periodicals.^ Space is available for listing only a 
few of the representative sources falling under each of these 
headings, and for indicating the scope of the statistical mate- 
rial issued. 

It is not in keeping with our purpose to compile a catalog 
of statistical sources, neither is it to our interest to make a 
compilation of the statistical material which is or might be of 
interest to students of business and social affairs. A certain 
amount of the foraging or exploring instinct, and at least a 
general knowledge of what data are likely to be available in 
the sources to which reference is made, are presupposed on the 
part of the person who has occasion to use published statistics. 
If such knowledge is lacking, it may be easily acquired by those 
who really seek it. 

But it is inadequate alone to know the sources of statistical 
data. More is needed. The ability to pass judgment on the 

*For a list of the main agencies, both public and private, together 
with a description of the nature of the data published, the name of the 
publication in which they are contained, and the date of publication, see 
Survey of Currmt BuHneas, Monthly Supplement to Commerce Reports, 
United States Department of Commerce. This Burvey, published 
monthly by the United States Deiiprtment of Commerc»e. contains a 
selected body of data on matters pertaining to business. 
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value of such data is also necessary. In addition to both, 
training is required in the scientific use of the data for the pur- 
poses desired. It is primarily the last aspect of the problem 
in which our interest lies. 

A List op Some op the More Important Sources op Secondary 
Statistical Data 

The Federal Government 

U. S. Department of Agriculture 

Bureau of Agricultural Economics 
Bureau of Animal Industry 
Forest Service 

U. S. Department of Commerce 
Bureau of the Census 

Bureau of Foreign and Domestic Commerce 
Bureau of Navigation 
U. S. Department of the Interior 
Bureau of Mines 
Geological Survey 
U. S. Department of Labor 
Bureau of Immigration 
Bureau of Labor Statistics 
U. S. Treasury Department 
Federal Reserve Board 
Federal Trade Commission 
Interstate Commerce Commission 

The State Governments 

Illinois Department of Labor, Springfield 

Massachusetts Department of Labor and Industries, Boston 

New York State Department of Labor, Albany 

Pennsylvania Department of Labor and Industry, Harrisburg 

Wisconsin Industrial Commission, Madison 

Wisconsin Tax Commission, Madison 

Research Agencies 
University 

Brown University, Bureau of Business Research, Providence, 
R. I. 

Carnegie Institute of Technology, Dept, of Commercial En- 
gineering, Pittsburgh, Pa. 
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Harvard University, Bureau of Business Research, Cam- 
bridge, Mass. 

New York State College of Agriculture, Cornell University, 
Department of Agricultural Economics and Farm Man- 
agement, Ithaca, N. Y. 

New York University, Bureau of Business Research, New 
York, N. Y. 

Northwestern University, Bureau of Business Research, 
Chicago, 111. 

University of Colorado, Bureau of Business and Govern- 
mental Research, Boulder, Colorado 

University of Illinois, Bureau of Business Research, Urbana, 

lU. 

University of Nebraska, Committee on Business Research, 
Lincoln, Neb. 

University of Oregon, Bureau of Business Research, Eugene, 
Oregon 

University of Pennsylvania, Industrial Research Depart- 
ment, Wharton School of Finance and Commerce, 
Philadelphia, Pa. 


Other 

American Institute of Agriculture, Chicago, 111. 

Bureau of Railway Economics, Washington, D. C. 

Food Research Institute, Stanford University, California 
Institute for the Study of Land Economics, Madison, Wis. 
Institute of Economics, Washington, D. C. 

International Institute of Economics, New York, N. Y. 

Life Insurance Sales Research Bureau, New York, N. Y, 
National Bureau of Economic Research, New York, N. Y. 
National Industrial Conference Board, New York, N. Y. 
Russell Sage Foundation, New York, N. Y. 

Trade Associations and Private Organizations 

American Face Brick Association, Chicago, 111. 

American Newspaper Publishers^ Association, New York, N. Y. 
American Iron and Steel Institute, New York, N. Y. 
American Railway Association New York, N. Y. 

Automobile Manufacturers' Association, Chicago, 111. 

Chicago Board of Trade, Chicago, 111. 

F. W. Dodge Corporation, Boston, Mass. 

National Association of Farm Equipment Manufacturers, Chi- 
cago, III 
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National Automobile Chamber of Commerce, New York, N. Y. 
New York Coffee and Sugar Exchange, New York, N. Y. 
Portland Cement Association, Chicago, 111. 

Silk Association of America, New York, N. Y. 

United Typothetae of America, Chicago, lU. 

This list of sources of secondary data refers to statistics of 
interest primarily to the business man and student of busi- 
ness. It is not intended to be complete. Reference should 
also be made to the matter contained in the footnote below.^ 
These sources contain statistical data of the ^^secondary^^ 
sort. To pass judgment upon their merits even for a specific 
purpose would involve an enormous amount of study and dis- 
cfimination, since each collection has its own peculiarities and 
is collected with a given end in view. To judge of their value 

'For an account of the sources of statistics on produce markets, see 
Mudgett, Bruce D., “Current Sources of Information in Produce 
Markets,” in Annuls of the American Academy of Political and Social 
Science, Vol. XXXVIII, No. 2, pp. 104-125. On some of the private 
organizations regularly collecting and issuing statistical data, see Par- 
melee, Julius H., “The Utilization of Statistics in Business,” in Quar- 
terly Publications of the Amet'ican Statistical Association, June, 1917, 
pp. 565-576. See also Haney, Lewis H. and Meyer, 0. C., Source Book 
of Research Data, Prentice-Hall, New York, 1923; West, Carl J., 
Market Statistics, U. S Department of Agriculture, Washington, D. C., 
Bulletin 982, June, 1921 ; Statistical Abstract, Department of Commerce, 
Washington, D. C. 

The student who has or wishes to cultivate an interest in statistics 
pertaining to business should regularly consult the following, among 
other, publications* 

The Federal Reserve BwZZeiw, The Federal Reserve Board, Washington, D.C. 
The Monthly Reviews of Business Conditions, The Respective Federal 
Reserve Banks. 

The Monthly Labor Review, U. S. Department of Labor, Washington, D.C. 
The Review of Economic Statistics, Harvard Committee on Economic 
Research, Cambridge, Mass. 

Harvard Economic Service, Harvard Committee on Economic Research, 
Cambridge, Mass. 

The Brookmire Economic Service, New York, N. Y. 

Babson Statistical Service, Wellesley Hills, Mass. 

Dun*s Review, R. G. Dun, New York, N. Y. 

Bradstreefs, The Bradstreet Company, New York, N. Y. 

The Annalist, New York, N. Y. 

Moody* s Investors Service, New York, N. Y. 

Commercial and Financial Chronicle, Wm. B. Dana, New York, N. Y. 

The Journal of the American Statistical Association, Columbia TJni- 
versity. New York, N. Y. 
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for general purposes is impossible, because no criteria of dis- 
tinction are offered. Yet, it is not impossible to point out cer- 
tain tests to which they should all be subjected before they 
are used. It is the purpose of the following section to outline 
such a series of tests. 

IV. Tests to be Applied to Secondary Statistical 
Data Before They are Used 

The inquiries which should always be made about second- 
ary data relate to (1) the organization which supplies the 
data, (2) the purpose for which they are issued and the con- 
sumers to whom they are addressed, (3) the nature of the 
data themselves, (4) the units in which expressed, (5) their 
accuracy, (6) the extent to which they refer to homogeneous 
conditions, and (7) their application to a given problem. 
Each of these topics requires special consideration. 

1. THE organization SUPPLYING SECONDARY DATA 

Every statistical organization is created for a purpose and 
has a special function to perform. Some are public, some semi- 
public, and others private. Some are old and have well-estab- 
lished standards of excellence; others are relatively new — are 
struggling to secure information, and trying to present it in 
a form suitable to a special clientMe. Some are adequately 
financed and have proper entree to sources of information; 
others are financially embarrassed and must be content to 
secure information from any source available. Some have 
legal sanctions to compel information to be furnished in keep- 
ing with a carefully prepared plan relating to each detail cov- 
ered; others must be content with information gratuitously 
furnished, and in a form which suits the interest, prejudice, 
or peculiar records' of informants. 

If these and other differences characterize organizations 
which publish statistical data, then the person who has occa- 
sion to use such material must ask, and answer to his own sat- 
isfaction, the following, among otjier, questions: 
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(1) What types of organizations issue the data desired? 

(2) Is there a choice between them? 

(3) What standards of excellence obtain in their collection, and in 

their interpretation? 

(4) Is there anything in the nature of the organization which might 

prejudice tha data in any vital particular? 

Some information about all of these inquiries is' available. 
It may be difficult to secure, and be incomplete, yet, to any one 
who really desires it, methods are available by which it may be 
secured. Any responsible statistical organization is glad to 
describe its form of organization and its methods. 

2. THE PURPOSE FOR WHICH SECONDARY DATA ARB ISSUED AND 
THE CONSUMERS TO WHOM THEY ARB ADDRESSED 

Whatever may be the type of its organization, each statis- 
tical body has its own policy and its’ particular purpose. Ac- 
cordingly, there is generally some basis for a choice between 
sources, notwithstanding the fact that they appear to present 
the same or similar data, and to serve the same clientele. 
Choice will generally depend more upon the purpose which an 
organization serves than the type of the organization itself. 
These purposes' may be: 

(1) General or specific 

(2) Restrictive or inclusive 

(3) Transient or permanent 

(4) Scientific or unscientific 

Because of these differences in the purposes for which data 
are collected and published, secondary data ought not to be 
used indiscriminately. They are good or bad, satisfactory or 
unsatisfactory, in the light of the purpose which controlled 
their collection or selection, their grouping and combination, 
and the analysis which has been made of them. 

3. THE NATURE OF THE SECONDARY DATA THEMSELVES 

In the use of secondary data, after the type of organization 
which issues them and the purposes which they are intended to 
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serve have been determined, the data themselves must be 
examined. The following among other facts should be con- 
sidered: 

(1) Are the data biased? Bias may be due to (a) wilfully 
eliminating parts of the facts, (b) basing comparisons upon 
insufficient data, or (c) relating them to unrepresentative pe- 
riods* or conditions. When prompted by motives to deceive, 
little difficulty is found in making out a case from data which 
if otherwise used would tell a different story. If samples 
are chosen according to chance, an accurate account may be 
secured from comparatively few data. If, on the other hand, 
choice is biased, the effect of increasing the number of samples 
serves to increase the amount of error. No use should fcfe 
made of secondary data until the question of bias is' settled. 

(2) Are the data samples only, relating to (a) restricted 
groups or characteristics, (b) certain territories, (c) particular 
times; or are they complete for the subject matter to which 
they relate? 

Are all instances or frequencies included, or are samples 
selected: that is, are data inclusive or exclusive? Samples', 
in the very nature of the case, are generally used. The entire 
^^population” — ^that is, all of the instances — save in studies 
based upon counts, are rarely included. Sampling, moreover, 
has to do with given times, classes or characteristics, and 
places. What bases of selection have been employed? How 
nearly do the samples describe the conditions to which- they 
relate? A satisfactory sample must contain the characteristics 
common to the entire population and these mmt be repre- 
sented in the same proportions as they are found in the mate- 
rial sampled. 

If data constitute a census, then they must be complete. 
Instances or cases, no matter how typical of a group or class, 
cannot be omitted. By hypothesis, they must be complete. 
If, however, they are taken as representative of a class, then 
comparatively few instances may suffice for a sample, pro- 
vided they are chosen at random, or with intent to in- 
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elude in suitable proportions the characteristics of the whole. 
Illustrations of problems requiring all of the data available, 
and of others which may be studied from samples, may help 
to make the discussion clear. 

The total population of the United States cannot be known 
without the inclusion of every one; the sex composition may 
be accurately determined from a well-selected sample. Sim- 
ilarly, the total retail sales of meat products m Chicago can- 
not be known if the sales of a single merchant are excluded. 
The (average) cost of selling meat, however, may be accurately 
known from the records of an adequate sample. Again, if one 
were interested in the question of farm ownership and tenancy 
m a state, for instance, it would probably be necessary to 
study more than widely scattered sections, since conditions are 
not necessarily homogeneous as to the prevalence of owner- 
ship, nor uniform respecting the terms under which tenancy 
exists. If the types, amounts, and economic status of immi- 
grant labor in the United States were being studied, one would 
hardly be safe in using data for a single state or city. It might 
be possible by so doing to secure data which are typical of the 
total immigration, but more than typical facts are wanted. 
The problem suggests a quantitative and not alone a qualita- 
tive result. The same is true respecting studies of births, 
deaths, accidents, etc. To record an occasional death, birth, 
or a few of the serious industrial accidents is inadequate. It 
is necessary to include all deaths, all births, and all accidents. 
vAccident risks, for instance, cannot be properly determined un- 
less all accidents occurring, the place where and the condi- 
tion under which they happen, and the extent of disability, 
etc., are known. 

On the other hand, if all that is desired is to indicate the 
trend in a given set of facts, it may suffice to take well-dis- 
tributed samples. Changes in prices can be statistically 
determined without including statistics of all prices. The move- 
ment of wholesale prices, over a period of time, can be meas- 
ured by using the prices of a comparatively few well-selected 
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commodities. The same is true of price changes of raw prod- 
ucts, or of goods in which the final consumer is interested- 
The trend of the price of real estate, or of stocks and bonds, 
may be measured by the use of comparatively few but repre- 
sentative sales. Wage increases or decreases may be shown 
by a process of sampling, provided the samples are chosen 
with discrimination. An illustration of a case where samples 
sufBce is found in the use by real estate boards and tax bodies 
of sales statistics in order to determine either the ^^market’^ 
or ^True value’’ of real estate. The chief consideration is the 
representative character of the samples. 

If it is desired, for instance, as evidence of the value of a 
piece of property, to enumerate the number of people who pass 
it, it is sufficient to include relatively short periods typical of 
both rush and slack hours for representative days. Likewise, 
the scale of rents in a given district may be determined with 
sufficient accuracy for commercial purposes by considering 
rents of representative houses. It is not necessary to include 
all houses rented. Care must always be exercised, however, 
to see that the sampling, howsoever carefully made for pur- 
poses of original compilation, is suitable for the purposes in 
mind. It may be stated, as a general rule, that the more 
nearly all data are included, the less is the likelihood of bias 
controlling, and the more readily can they be converted to a 
parHcvXar use. Under such circumstances the particular facts 
desired may be more easily chosen and extraneous ones elim- 
inated. Again, however, nothing better than general principles 
can be laid down as a guide to the appropriate use of secon- 
dary material. Discrimination and caution are essential in 
scientific study and in the formulation of valid conclusions. 

But how is it to be known from secondary data, as pub- 
lished, what bases have been used in selecting the samples? 
The regrettable truth is, that in too many cases it cannot be 
known. Publications have a practice of omitting all qualify- 
ing statements; of removing from the tabulated data all ex- 
planatory details; and of expecting the reader to take on faith 
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the accuracy, completeness, and representativeness of the mate- 
rial which is published. Not infrequently one is' at a loss to 
know anything about such data. Sources are not given, irrec- 
oncilable totals are not explained, and inconsistencies abound. 
Under such circumstances, “Discretion is the better part of 
valor. The student may better refuse to use data than to be 
continually in doubt as to their meaning, scope and signifi- 
cance. 

4. IN WHAT TYPES OF UNITS ABE THE DATA EXPRESSED? ARB 
THEY THE SAME AT DIFFERENT TIMES, AT DIFFERENT PLACES, 
AND FOR ALL CASES AT THE SAME TIME OR PLACE? 

* Secondary data are always presented in units of time, of 
place, or of condition. They are given, for instance, by months, 
by districts, and by age or size groups. Are the “months” 
always of the same length, and do they always begin and end 
at the same time? Similarly, are the “districts” always of 
the same size and do they have the same boundaries? Again, 
are the age or size groups the same from “month” to “month” 
and from “district” to “district”? Do±he “same” data in two 
publications refer to the same time, place, and condition? Can 
the material from one source be combined with or used in the 
place of that from another? 

Moreover, are the same things counted from time to time, 
and from place to place? In what kinds of units are they 
expressedi and what criteria are used to distinguish them? 
What, for instance, is a commercial failure, a bank loan, a 
farm, etc., as published in compilations of statistical data? 
Are failures” and “farms” always identified in the same way? 
If they are no^ and the differences are unknown, then how 
valuable for comparative purposes are the data concerning 
them? 

The units in which data are expressed are of three general 
types. For convenience, they may be classified as simple units, 
as composite units, and as coefficients or ratios* 

By simple units are meant those in which one determining 
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consideration is prescribed. The ideas conveyed are general; 
classes only being distinguished. Most statistics of enumera- 
tion employ simple units: as, for instance, when persons, 
animals', acres, buildings, passengers, stocks, deaths, laws, 
sales, etc., are counted. In statistics of this type the dis- 
turbing elements due to inaccuracies in the units are reduced 
to a minimum. Nothing, of course, is said about the accuracy 
with which the units are defined, of the care with which the 
definitions are followed, nor of the accuracy with which the 
enumerations are made. The characteristic feature of such 
units is the presence of a single determining condition. This 
normally guarantees against the presence of as great, or of a 
greater degree of error than would be associated with condi- 
tions when units are convposite in character. Such a unit as 
a ^Tarm” might be easily defined and the statistics of farm be 
readily understood. When, however, the expression ^^im- 
proved’^ is added to this unit and it becomes composite, the 
scope of the definition and its application are restricted. Error 
may enter into it with the same readiness as into the other 
portion of the combined unit. Likewise, in statistics of “daily 
wages^’ or of a “fair return,” the same observation applies. 
Crops in bushels or in acreage may be readily determined — 
whether those crops are “normal,” however, raises further 
questions. As limiting conditions are added to simple units, 
occasions for error and bias crowd in, and it is these to which 
attention is drawn in distinguishing simple from composite 
unite. 

Statistical data may also be expressed as ratios or coejfi- 
dents. The units then take the form of comparative state- 
ments: as, for instance, when deaths are expressed in terms of 
^thousands of population, bushels per acre, wealth as so much 
j)er capita, expenses of operation in thousands of dollars of 
oales, etc. 

Every ratio or coefficient has both a numerator and a 
denominator, the number or amount indicated by the ratio 
being in effect a comparison between the numerator and the 
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denominator. Ratios imply definite relations between the parts 
of which they are composed. If no such relation exists, 
or if the one established is ''crude'’ — ^that is, general rather 
than specific — ^then the units of measurement are misleading. 

To establish a coefficient, it is necessary (1) to secure the 
factor in the numerator, (2) to secure that in the denominator, 
and (3) to relate the on© to the other. If any of these steps 
are not properly carried out, then the ratios or coefficients are 
faulty. And how frequently is the user of secondary data in 
doubt respecting not one but all of them ! 

A ratio or coefficient should be assignable to the conditions 
which make it possible. That is, the denominator should be 
capable of producing the condition named in the numerator. 
This is only another way of stating the thought of Bertillon 
when he says: "Always relate effects to the causes produc- 
ing them." 

One should not relate the number of deaths from spinal 
meningitis to the whole population, nor in this respect compare 
populations of entirely different age composition. Neither 
should one compare the number of industrial accidents for simi- 
lar plants where the hazard or exposure, in terms either of man- 
or machine-hours, is widely different. Likewise, statistics of 
the number of farm accidents should not be related to the total 
number of farm employes, but only to the ntimber employed 
in occupations producing the accidents. The mining industry 
is often classified as "dangerous," yet it is noticeably so only 
when the accidents are related to the types of occupations in 
which the hazard is exceptional.^ 

Loose thinking always results when effects are not related 
to the specific causes producing them. Long hours, poor ven- 
tilation and light in factory or mill are often assigned as the 
causes of occupational disease, yet it is not always clear how 
much of it ought not to be assigned to home life, intemper- 
ance, etc. — conditions only remotely associated with or en- 

^For a more complete discussion of Units of Measurement, see Chap- 
ter IV, infra. 
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tirely dissociated from occupations per se. In each case, re- 
sponsibility can be assigned only after investigation and after 
each effect is related to its specific cause. 

It is not a sufficient justification for the violation of this? 
principle to maintain that in economic life effects are rarely 
if ever to be attributed to single causes^ and, therefore, that all 
effort to allocate the responsibility is useless. The statement 
is true but the inference does not follow. It serves, however, 
to call attention to the extra care which it is necessary to take 
in matters’ affecting economic and social conditions before 
conclusions are drawn from, and policies mapped out upon 
them. Again, the best that can be done here is to call atten- 
tion to this important fact and leave the student, thus warned, 
to make application of it in each problem considered. 

5. ARE THE DATA ACCURATE? 

Accuracy is a relative term; it is impossible to secure abso- 
lute accuracy in measurements affecting social and business 
affairs’. Some are more accurate than others, and so-called 
^'accurate measurements” for one purpose may be grossly inac- 
curate for others. 

The type of accuracy to which reference is made is not of the 
clerical type, although that is important. Computing devices 
which insure accuracy of this kind are now in common use, 
and it is seldom necessary, in using secondary data, to check 
numerical computations. Occasionally, however, errors of this 
type do occur./ 

"Sometimes they appear in the form of a disagreement of sup- 
posedly identical figures given m different numbers of the same 
journal, or of important inconsistencies in figures taken from the 
same table. Errors of this sort are, of course, sometimes due to mis- 
prints, which no care in publication can wholly eliminate. Some- 
times, seeming inconsistencies are occasioned by the fact that prelimi- 
nary figures are later subjected to decided revision. * ^ * But what- 
ever their cause, the fact that significant discrepancies of various 
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types do occur indicates the need of careful examination of * * 
data before they are utilized.”^ 

The use of secondary statistical data is conditioned, among 
other things, by (1) the accuracy with which they are re- 
ported, (2) the accuracy with which they are determined, and 
(3) the accuracy with which they might be determined. Each 
of these different paints of view requires brief consideration.^ 

The accuracy with which data are reported and collected de- 
pends upon (1) the type of informant, (2) the nature of the rec- 
ords kept, (3) the type of questions asked, and (4) the care 
used in answering them. If difficult and unfamiliar questions, or 
• questions' which in any way incite distrust or suspicion, are 
asked, answers are likely to be either incomplete, brief, non- 
committal, general, or purposely evasive. Age, for instance, 
may be accurately known, but falsely reported. Wages may 
be known and yet incorrectly reported because of a suspicion 
as to the use to which the data will be put. Moreover, even 
in cases where there is no reason for data to be falsely reported, 
error may occur in transcribing and tabulating them. 

On the other hand, data may be correctly reported but the 
report itself be inaccurate because the answer is wrongly de- 
termined. Much of the data, until recently, respecting causes 
of death fell under this head. No necessary difficulty is ex- 
perienced in reporting,^ but only in determining the precise 
cause, or in calling by the same name the same thing. The 
necessary corrective is, of course, the use of a standard classi- 
fication of cmises of death. Likewise, statistics of occupations 
suffer greatly from the lack of a standardized nomenclature. 
Identical occupations are called by different names; things 

Persons, Warren M., “Indices of Business Conditions,” The Review 
of Economic Statistics, Cambridge, Mass., January, 1919, p, 6. 

®For discussion of similar points respecting wage data, see Chapter 
V, “Types of Secondary Wage Data.” 

* See “Errors in Death Registration in the Industrial Population of 
Fall River, Massachusetts,” Monthly Review, U. S. Bureau of Labor 
Statistics, Vol. 5, No. 1, July, 1917, pp. 2-8. This article slightlv 
adapted is reprinted in the author’s Readings and FroUems in Statistical 
Methods, Macmillan & Company, New York, 1920, pp. 141-147. 
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which are equal to the same thing, in reality, are not equal to 
each other in name. As a basis for determining occupational 
risk, and for developing schemes of accident compensation or 
insurance, for instance, they are almost worthless. Fortu- 
nately, however, some progress’ toward uniformity of occupa- 
tional naming is now being made. Here, as in the former case, 
the personal equation is important, but more often the real 
source of trouble lies, as in the instances cited, in the nature 
of the problem itself. 

Statistics of '^capital employed^^ in manufacturing industries, 
as reported by the United States Census Bureau, are faulty 
because of the inaccuracy with which they are determined. 
The definition of capital for statistical purposes offers the 
first difficulty. Authorities are not agreed as to what should 
be included as ^^capital.” The reasons for including or ex- 
cluding different categories vary and are of different force in 
different industries, or in the same industry under different con- 
ditions of management and forms of business’ organization. 
For census purposes, even, such a unit must of necessity be 
used with little more than a semblance of accuracy; and, of 
course, the statistics relating to it ought to be considered as 
estimates’. The same thing applies to 'Value of products,'^ 
“cost of materials,’^ “expenses,” etc. The difficulties are not 
necessarily due to errors in reporting (yet, undoubtedly, they 
are important), nor in the accuracy with which such facts 
might be determined, but rather with the accuracy with which 
they are determined under the conditions of collection. 

If nothing more is desired than to indicate a trend, this 
may be done, in cases where complete accuracy of detail is 
wanting, provided errors are distributed uniformly about the 
average and tend to correct each other, and where sampling 
is’ representative. These conditions, however, so seldom ob- 
tain (never in the last instances cited) that data of these 
kinds must be used with great care for any use where ac- 
curacy is important. It is painful to see nice distinctions and 
weighty conclusions rest upon such questionable support! 
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On the other hand, secondary statistical data are frequently 
compiled where it is impossible to secure absolute accuracy, 
and where no pretense should be made that it is realized. 
The data at best are crude estimates. At present, for instance, 
no statistical machinery is available accurately to determine 
the amount of gold-producing ore in the United States; the 
horse-power of our water power resources; or the amount of 
standing timber in the United States.^ Of course, there may 
be accurate as there may be inaccurate estimates, and it is 
always necessary to choose those which, all things considered, 
seem best to meet the requirements of the case. Moreover, 
they should be med as estimates. Essentially accurate conclu- 
sidns may be drawn from rough estimates, if the basis upon 
which they are made is known, but even then, statistical skill 
and sound judgment are required. 

Moreover, not all phenomena can be statistically measured. 
Numerical frequency may be of no real significance. For in- 
stance, the devotion of a people to a principle of right or jus- 
tice can hardly be measured by the number of those who 
find no occasion to violate it. Neither can respect for law 
be determined by estimating or counting the number of people 
who remain out of jail. Conversely, disregard for law is not 
fully measured by the number of arrests and convictions. The 
number of those insane is not necessarily indicated by the 
commitments to insane asylums together with the occupants 
of such institutions. The sacredness with which marriage is 
regarded is not accurately reflected by the number of divorces 
granted; nor the number who are educated secured by totaling 
the students enrolled in institutions of collegiate and univer- 
sity rank. It is hopeless to expect statistical data alone to 
answer these questions. 

^ See the interesting report on ‘‘The Lumber Industry, Part I, Stand- 
ing Timber,’’ by The United States Bureau of Corporations, 1913, where 
methods of estimating the amount of standing timber in various districts 
and for various woods are described and criticized, pp. 7-10, 45 ff. This 
IS reprinted in the author’s Readings and Problems in Statistical Meth* 
ods, Macmillan & Company, New York, 1920, pp. 91-110. 
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6. DO THE DATA BEPEB TO HOMOGENEOUS CONDITIONS? 

Business and social relationships change — ^they are always in 
a state of flux. New policies, methods, and standards are al- 
ways being introduced. New units of measurement, there- 
fore, are needed to indicate the nature and extent of the 
changes, the old ones having lost their significance. The facts 
of yesterday may have little meaning for those of to-day. 
For instance, if, in a given market, “future” are supplanting 
“spot” transactions — and the level of prices has changed be- 
cause of this fact — ^then prices of to-day cannot be compared 
with those of yesterday, when such methods of dealing were 
less common. Moreover, retail and wholesale prices canhot 
be directly compared. The conditions affecting them are differ- 
ent. Similarly, paper and gold prices cannot be compared un- 
til they are put upon the same basis. 

Not only may statistical data be descriptive of non-homo- 
geneous conditions (and this fact be not revealed), but they 
may also vary in composition at different times. Reporting, 
editing, tabulating, and analyzing may be of widely different 
degrees of excellence. Emphasis may have been differently 
placed; different definitions may have been insisted on; new 
units of measurement or modifications of old ones may have 
been employed; wider or narrower fields may have been 
covered; the proportional elements used to make up a total 
may have changed materially ; etc. The presence of these and 
similar conditions makes comparisons over long periods difl5- 
cult. 

The desire for “comparability” often becomes the controlling 
factor in statistical computation, and serious omissions, 
strained interpretations, etc. (all important in the use of the 
data for a given time), countenanced in order to preserve it. 
For instance, the retention of the “capital” inquiry, in all its 
crudity, in the statistics of manufacture in the United States 
Census is largely out of consideration for the “value of com- 
parisons.” The omissions, until recently, of fifteen commodi- 
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ties formerly used in the computation of the index number of 
retail prices by the United States Bureau of Labor Statistics 
at least raises the question whether prices before 1907 can be 
compared with those since that date.^ The various definitions 
of a “farm/' of an “establishment/' or of “manufacturing/' as 
used by the United States Census Bureau at different times, 
make comparisons difficult over an extended period. Exports 
and imports, for instance, whether expressed in quantities or in 
values, must always be interpreted in terms of the units* of 
measurement employed.^ The student should always go be- 

^The lack of comparability has been definitely asserted by a recent 
Commissioner of the Bureau of Labor Statistics. ‘‘Some Features of 
** the Statistical Work of the Bureau of Labor Statistics,” Eoyal Meeker, 
Commissioner, Quarterly Publtcatione of the American Statistical Asso- 
ciation, March, 1915, pp. 431-441. 

“ Most interesting discussions of the difl&culties of making international 
comparisons of import and export statistics, and of the imperfections of 
our own import and export statistics, aie contained in an article by 
Frank R. Rutter on “Statistics of Imports and Exports,” in The Quar- 
terly PulUcations of the American Statistical Association, March, 1916, 
pp. 16-35. Apropos the topic here under consideration, the following 
extracts are of interest: 

By virtue of a law passed in 1893, the agent of a railroad company 
carrying goods to a foreign country by land was made punishable to 
the amount of $50 for failure to present a manifest to the collector of 
customs. “The effect of the change in law is reflected in the exports 
through Buffalo to Canada. From less than $500,000 in 1890 the fig- 
ures jumped to over $4,000,000 in 1895.” Ibid,, p. 20. 

On the matter of units of measurement and classification, the follow- 
ing quotation is of interest : “The greatest need for the expansion of 
the classification is found in the case of exports. The most detailed 
classification of exports now covers less than 600 items, while in the im- 
ports for consumption there are about 3000 distinct items. The chief 
preventive of an increase in the number of items is the indefinite char- 
acter of export declarations. So many articles are described merely by 
general terms that it is out of the question to separate articles fre- 
quently of much commercial importance. 

“Defects in the present classification, aside from its incompleteness, 
are the incomparability of the import and export schedules and the 
failure to conform to current commercial terms. The latter defect is 
due to the preservation in the tariff of many terms now obsolete, and 
the necessity of having the statistical classes follow closely the tariff 
items.” Ihtd., p. 26. 

On the definition of “imports” the author says: 

“What is generally understood by the term ‘imports*? Legally, an 
article is imported when landed, whether for immediate consumption 
or for storage in bonded warehouses. From an economic point of view, 
however, bonded warehouses may well be regarded as foreign territory. 
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hind the printed figures and be sure of the units, their inter- 
pretation, and the weight assigned to the different factors in 
the composite groups before comparing them or using them as 
a basis for a conclusion.’^ 


7. ABE THE DATA GERMANE TO THE PROBLEM BEING STUDIED? 

He who has occasion to use secondary data, having inquired 
into the standing of the organization publishing them, and hav- 
ing satisfied himself as to the purpose for which they were 
issued, the nature of the data themselves, the units in which 
expressed, their accuracy, and the homogeneity of the con- 
ditions to which they apply, must then ask himself the fol- 

r 

The door of the bonded warehouse is really the economic frontier of the 
country. 

“Since the United States is not a large reexporting country, the differ- 
ence between ‘imports’ and ‘imports for consumption’ is largely one of 
time. The instances in which goods are exported from warehouses are 
few as compared with the instances in which after the lapse of time 
goods are entered for consumption within the country. 

^ “Perhaps the distinction is most clearly brought out by an illustra- 
tion. While the last tariff was under discussion wool in large quantities 
was landed at our ports and stored in bonded warehouses until Decem- 
ber 1, 1914, when it could be withdrawn without payment of duty. Was 
such wool really imported when it was ’anded or when it was removed 
from the warehouse? 

“On the export side we have a clear distinction between domestic ex- 
ports and foreign exports. On the import side imports for consumption 
are most nearly comparable with domestic exports, yet not fully com- 
parable, since free goods are not generally warehoused and may be en- 
tered for consumption although intended for reexportation. To be 
strictly accurate, dutiable imports for consumption should be compared 
with domestic exports and free imports with domestic and foreign ex- 
ports combined.” Hid,, p. 28. 

“Perhaps the most striking instance of the unfortunate result of our 
method of valuation is seen in the import prices of rubber. Notwith- 
standing the improvement of plantation rubber, Para rubber is still 
quoted at a slightly higher price. In Brazil, however, there is a heavy 
export duty, which constitutes an important element in the price. This 
duty is not included in our statistical valuation with the result that 
the value of India rubber imported from Brazil during the fiscal year 
1914 averaged only 40 cents a pound, while the import value of that 
from Ceylon averaged 60 cents a pound.” lUd., p, 30. 

^ Bowley, A. L., “The Improvement of Official Statistics” in the Jour- 
nal of the Royal BtaUatical Society, September, 1908, Vol. 71, pp. 461- 
469, particularly. This article with slight adaptations is reprinted in 
the author’s Readings and Problems in Statistical Methods, Macmillan 
& Company, New York, 1920, pp, 150-159. 
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lowing questions: Are the facts really germane to my pro5- 
lemf Can they be used for the 'purpose which I have in mind? 
These are significant questions. Upon the answers to them 
depend all subsequent steps in statistical procedure. 

Many statistical data, which have only a general applica- 
tion to a particular problem, may, if used with discrimination, 
corroborate a thesis which they would not alone be sufficient 
to support. Contrariwise, they may be sufficient to throw sus- 
picion upon, although they would not themselves disprove, it. 
How data may be used, can never be known until their char^ 
acteristics have been determined. They should never be used 
without this information, 

*“The first thing to realise about official, and indeed all, statistics, 
is that their meaning is always technical and generally not pre- 
cisely that which might at first sight be expected. * * Statistics 
on any subject have generally a long history. In the beginning an 
organisation had to be initiated to collect records of those thmgs 
connected with the subject which it was anticipated could be counted 
or measured. Experiment showed what facts could be ascertained 
and where the organisation was weak, criticism and analysis de- 
fined and mterpreted the meaning of the totals and averages ob- 
tained, and showed their relation to the facts of which knowledge 
was desired. The organisation was gradually improved, new methods 
were devised for making good deficiencies, the meaning of the totals 
was modified and new definitions were necessary. When one has 
followed the process by studying successive reports or by reading 
a well-mformed book or article on the subject the limitation and 
meanmg of the totals can be appreciated; failing this, the best plan 
IS first to think out for one^s self what one would expect or wish 
to be mcluded in a total (e.g. of the number of persons unemployed), 
then to read very critically word by word the heading, explanation 
and notes in the summary (always inserting some such phrase as 
Recorded by^ or 'reported to' or 'computed by' the department con- 
cerned), and then to get the larger report on which the abstract is 
based and study whatever information is there given about the 
method and purpose of the investigation. The critical faculty 
should be very alert when statistics are in question; the published 
heading may be pedantically and officially correct, but it will not 
contain such a s^tement as 'every word is used in a technical sense 
and has a special meaning only known to the officials who made the 
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compilation, the part that is not recorded is more important than 
that which is, where the facts are not known an estimate has been 
made by a method which cannot for departmental reasons be di- 
vulged, and the method of computation has been modified since 
the last issue of the numbers,’ yet part of all of this is sometimes 
implied.” ^ 

In spite of the fact that statistics of some sort are to be 
found on almost every conceivable subject, those which are 
available may not suit the purpose in mind — if it is clearly 
formulated — or they may apply to inappropriate times, places, 
or conditions. It is then necessary to collect those which are 
suitable. Primary rather than secondary data must be se- 
cured. A discussion of the problems connected with such 
a task is the subject matter of the following chapter. 
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CHAPTER III 


COLLECTING AND EDITING PRIMARY 
STATISTICAL DATA 

I. Inteoduction 

The student or investigator who has occasion to collect pri- 
mary statistical data must ask and answer the following 
questions: 

(1) What is the precise problem upon which statistics are required? 

(2) Does the problem, as formulated, lend itself to statistical treat- 

ment? 

(3) What types of data are necessary for its analysis or solution? 

(4) Are they likely to be available in suitable form? 

(5) Are they likely to be adequate for the purpose in mind? 

(6) Will they have the required degree of accuracy, consistency, and 

comparability ? 

(7) Can the data be made available within the time limit required: 

that is, will they have the required currency? 

(8) Are there likely to be any restrictions upon the use of data 

which will compromise the purpose which they are to serve? 

(9) What sanction is necessary, and what method of procedure, 

with the sanction available, must be followed in order to se- 
cure the desired facts? 

Subsequent steps depend upon the answers supplied to them. 
They constitute a sort- of formal catechism to which one should 
be willing to subject himself before proceeding further. 

II. Pkblimihaky Conditions to the Collection of Pbimaby 

Data 

The problems involved in satisfactorily answering each of 
the above questions require separate consideration. 

47 
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1. WHAT IS THE PRECISE PROBLEM UPON WHICH STATISTICS 
ARE REQUIRED? 

The idea of a problem suggests a difScuIty — some thing or 
an aspect of a thing which is* unsettled or not understood. 
Before it can be stated, it must be clear that there is a 'problem. 
Its precise nature will take form as its different aspects are 
contemplated — ^that is, as they are mapped out and delimited. 
&uch contemplation is thinking. To think on a problem is 
to survey the facts about it; to define and classify them, and 
to see them in their proper relations. Not until it is known 
what facts are to be considered and in what way can a prob- 
lem be stated; and one© it is stated clearly ^ its solution is 
greatly expedited. 

To think about a problem and to state it require knowledge 
concerning it. This has to be acquired: it does not simply 
^^come.^' It is sheer waste of time to begin collecting data on 
a problem until it is defined It must be seen in relation to 
other problems. What these relations are can be known only 
by thought about them. 

The first and most essential step concerned with the collec- 
tion of statistics with respect to any problem, therefore, is to 
define and state the problem itself. 

But all problems do not lend themselves to statistical study. 
Some do; others do not. Moreover, when the statistical ap- 
proach is used, it is not always used in the same way. Neither 
does it involve the use of the same methods. Accordingly, 
the second question which must be asked before data are 
collected is: 

2. DOES THE PROBLEM, AS FORMULATED, LEND ITSELF TO 
STATISTICAL TREATMENT? 

Statistical studies are necessarily quantitative; statistical 
facts are always* numerical. Moreover, the frequencies or 
attributes of things have to be sufficiently distinct so as to 
make it possible to enumerate them. It is possible, for in- 
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stance, to study statistically the sex and age characteristics 
of insane inmates in hospitals; it is not possible by this method 
to determine the fact of insanity. It is also possible to meas- 
ure the distribution of the wealth and income of a people, but 
it is not possible by statistical means to determine what dis- 
tribution is socially or politically desirable. Again, if bank- 
ruptcy is considered equivalent to business failure, then the 
number of such failures, by types of business, age of business, 
location, capital investments, liabilities, etc., may be statis- 
tically determined. The parts which dishonesty, moral cow- 
ardice, speculation, etc , play as contributing factors' to such 
failures, however, cannot be directly measured in this manner. 
Why? Because they cannot be numerically stated, and if 
they could, their significance would not be indicated by quan- 
titative preponderance. 

A problem to be susceptible of statistical study should have 
characteristics which are quantitatively measurable. More- 
over, they should be capable of being distinguished with re- 
spect to time, to place, or to degree. Such conditions hold, 
for instance, for prices, wages, deaths; they do not obtain, 
for instance, for integrity, honesty, loyalty. 

If it is decided that a problem may be studied statistically, 
and for this purpose it is necessary to collect primary data, 
the next question which the investigator must ask himself is’: 

3. WHAT TYPES OP DATA ARE NECESSARY FOR ITS ANALYSIS 
OR SOLUTION? 

The answer to this question depends upon (1) whether data 
are needed to supplement, corroborate, or disprove those al- 
ready available upon the subject, or (2) whether an entirely 
new and different set of facts, expressed or measured in new 
units, is necessary. If the former condition obtains, then the 
data collected must have the same characteristics as those with 
which they are to be compared, or which they are to supple- 
ment. They may apply to different times and places pro- 
vided they exhibit themselves in the same way. Indeed, the 
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types of data already available on a problem determine the 
nature of those which are collected to supplement them. To 
duplicate what has been done is justifiable only when it is’ 
felt that existing data are incomplete, unrepresentative, or in 
some other respects inadequate or unsuited to the uses to 
which one desires to put them. The aim should be to supple- 
ment, to carry further the type of analysis which has already 
been made — to make the data already available function. 
Too frequently, statistical studies are imcorrelated with those 
already existing. They cover old ground, and contribute little 
or nothing to an understanding of the problems with which 
they have to do, largely because they do not constitute a 
necessary part of a comprehensive program, nor dovetail witK 
the studies which have already been made. They begin and 
end as isolated, unrelated efforts. 

If, on the other hand, data are to be collected but not to 
supplement those already existing, then choice is free, but 
within clearly defined limits. The first question which is pre- 
sented is: 

4. ARE THEY LIKELY TO BE AVAIIABLE IN SUITABLE FORM? 

Data which exist may not be available. They may be 
(1) confidential, (2) expressed in units unsuited to a particular 
use, or (3) scattered over so long a period or over so wide a 
territory that the expense involved in their collection is pro- 
hibitive. Another question is: 

5. ABE THEY LIKELY TO BE ADEQUATE FOR THE PUBPOBE 

IN MIND? 

A satisfactory answer to this query can be made only if 
the purpose is known, and if means are available for knowing 
the probable nature of the data. It is taken for granted that 
the first condition is fulfilled; the latter may be satisfied by 
sampling the data, or by consulting with those who possess 
them. 
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6. WILL THEY HAVE THE BEQUIRED DEGREE OP ACCURACY, 
CONSISTENCY, AND COMPARABILITY? 

The types of records from which they are to be drawn, the 
honesty of the informants, the care with which they are trans- 
ferred from the original records, and the manner in which the 
information is solicited all have significance in this’ respect. 

It is necessary to know the types of informants to whom 
appeal must be made. If they are ignorant, inclined to de- 
preciate the significance of the problem under study, or to 
oppose its continuance; if they are inclined to look upon every- 
thing as inconsequential and useless, little weight can be at- 
tached to the answers given. Likewise, the time, money, and 
organization available should be considered. Data may exist, 
informants be ever so willing to supply them, and yet the 
necessary facts be unavailable because of lack of funds or 
of time in which to secure them. Few people, not accustomed 
to planning statistical work, clearly realize the time, energy, 
and expense involved in a thorough statistical investigation. 

Further questions must also be asked and answered before 
the task of collection is begun. One which is important is as 
follows: 

7. CAN THE DATA BE MADE AVAILABLE WITHIN THE TIME LIMIT 
required: THAT IS, WILL THEY HAVE THE 
REQUIRED CURRENCY? 

On some problems, data to be significant must be current. 
This is true when they are needed to determine present rather 
than to reflect past conditions. On the other hand, for the 
solution of certain problems, current data are of less value 
than those which refer to the past. If, for instance, the normal 
relation between sales and expenses is to be determined, then 
current data are inadequate. Those which have to do with a 
normal period in the past are necessary. 

When matters of current business or of social interest are 
pressing for solution, statistical data referring to the past 
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are heavily discounted. The fact is, however, that the policies 
of to-day grow out of those of yesterday, and into those of 
to-morrow. The past is the present viewed in retrospect ; the 
future, the present viewed in anticipation. The desire always 
to be '^up to date^^ amounts in some instances almost to a 
mania. Sober thought of the past is often stabilizing, serving 
as it does to give a proper perspective to the present. 

8. ARE THERE LIKELY TO BE ANY RESTRICTIONS UPON THE USB 

OF DATA WHICH WILL COMPROMISE THE PURPOSE 
WHICH THEY ABE TO SERVE? 

Restrictions may take a variety of forms. For instance, 
certain data (1) can be published only as totals, or the in- 
stances only in groups; (2) cannot be published at all; (3) 
cannot be distributed except to a select few; or (4) if pub- 
lished at all must be given general distribution. 

9. WHAT SANCTION IS NECESSARY, AND WHAT METHOD OP 

PROCEDURE, WITH THE SANCTION AVAILABLE, MUST 
BE FOLLOWED IN ORDER TO SECURE THE 
DESIRED FACTS? 

Most public agents are possessed of mandatory power: that 
is, they may compel answers to be made to questions asked. 
Private individuals do not usually have the same sanction 
and its absence in most instances is a handicap. It is, however, 
sometimes possible for investigators, through contact with in- 
formants, and by cultivating their good-will, to develop in 
them a feeling of obligation to report, which more than com- 
pensates for any lack of mandatory power. So far as public 
statistical organizations are concerned, conspicuous instances, 
where a feeling of obligation to supply information has been 
well developed, are the cases of price reporting to the United 
States Bureau of Labor Statistics, and the reporting by unions 
of the conditions of employment to the Bureau of Labor Sta- 
tistics in Massachusetts and in New York. 
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By cultivating the good-will of informants, these bureaus 
have been able to enlist their support, and to secure excellent 
data with little actual inconvenience and cost. Various ways 
are open for securing their interest and good-will. One 
method is to guarantee that confidence will not be abused, 
that the study is scientifically undertaken and without the 
idea of personal gain or aggrandizement. Sometimes it is ac- 
complished through assurances being given that the request for 
statistics extends to a whole class rather than to a selected 
number of a class, and that when the returns are compiled they 
will be supplied gratuitously to all those who have contributed 
to their collection. Sometimes an effective method is to 
appeal to feelings of state or local pride, or to class conscious 
sentiments. 

Another way of gaining the confidence of informants is’ to 
study their interests and to cultivate their good-will by corre- 
spondence. This method is being used effectively in Massa- 
chusetts, where bureau oflScials are careful to indicate by semi- 
personal letters the value to informants and to the public 
generally of data to be collected, and the importance of answer- 
ing specifically and promptly the inquiries made. Even where 
mandatory power exists', it is not an uncommon practice for 
statistical bureaus requesting information, while quoting the 
terms of the law under which the collection is made, to make 
the idea of co-operation their chief appeal. A display of 
force or the use of threats should be used with discrimination, 
inasmuch as it may tend to incite a spirit of distrust and 
opposition rather than of co-operation. 

Private individuals, as contrasted with regularly constituted 
authorities, are usually handicapped in the collection of data 
by lack of sufficient sanction. The limitations under which 
they operate should be clearly kept in mind in order to guard 
against a too sanguine belief that they will always secure the 
information desired. Too great confidence as to the outcome 
of a given imdertaking generally characterizes the efforts of 
the inexperienced. 
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IIL The Collection Pbocess 
1 . PtJEPOSE AND PLAN J 

The process of collecting primary statistical data depends 
upon the purpose in mind and the plan outlined to realize 
it. There can be nothing hazy, confused, or indefinite about 
them if satisfactory results are to be secured. The problem 
should be clearly thought through and the plan be made 
complete from beginning to end. Only by so doing is it pos- 
sible to provide in advance for the contingencies which are 
sure to arise. Both require thought and care. Rarely, if ever, 
can statistical studies be rushed. Progress is: made slowly. 
An adequate foundation respecting both purpose and plan is 
essential. They are so important that much of Chapters IV 
and V is devoted to a discussion of them for typical problems. 

2 . THE COLLECTION PEOCESS DESCRIPTIVELY CONSIDERED 

The ways in which data are collected vary with the nature 
of the problem, and the organization which undertakes the 
task. No two problems require exactly the same methods. 
Each has its peculiar requirements. In every case there is a 
best method, and it is part of the task of the organization to 
determine what it is under the conditions obtaining. 

Statistics, like other information which is desired, must be 
secured by some one, in some may^ according to some method, 
and from some source. The one securing it may be the agent 
— ^private individual or organization — or his representative. 
The way in which it is solicited may be by interview, by per- 
sonal letter, by questionnaire or schedule, or by all of these 
means. The method of securing it may involve a count or an 
estimate, and the source of both may be found in personal 
opinion, or in records. 

The simplest situation in which data are collected is prob- 
ably that in which an organization or business merely sum- 
marizes or assembles information about its own activities. 
The collection may involve data currently kept in systematic 
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form in its own records, or it may pertain to facts not a matter 
of record but of estimate or opinion. Examples of the first 
type are sales, expenses, profits, output, loans, capital, assets, 
number of employes, etc. Illustrations of the second kind 
are estimates or opinions’ of salesmen respecting sales pros- 
pects for the coming season or year, general business condi- 
tions, influence of competitors, etc. In problems relating to 
matters of record, adjustments in the form of accounts, units, 
etc., are necessary where the methods are not standardized 
in the different departments. In all cases of this sort, how- 
ever, it is assumed, after the plan is thoroughly worked out, 
that so far as the collecting or assembling of the facts is con- 
ceVned, the task is largely one of transcribing in suitable form 
the data available. Motives for withholding part of the 
facts, of inaccurately stating those supplied, or of attempting 
to defeat the project, are not generally present. Unity of 
management tends’ to guarantee against failure in these re- 
spects. 

Moreover, personal bias, the desire to make a case, or re- 
liance on incomplete data do not normally obtain under such 
conditions. Of course, data assembled in this way are not 
always adequate for the purposes in mind. They may be 
incomplete, and inaccurate for other reasons than those sug- 
gested, more particularly if the assembling is done under the 
direction of some one untrained for such work. But collec- 
tion under such circumstances does not present the problems 
which confront the statistician from the outside who attempts 
a similar task, and who has no other sanction than that of an 
impersonal government or his own good intentions, and who 
too frequently has not the tact to enlist the sympathy and co- 
operation of those upon whom he must depend for success. 

It is, of course, true that most smaller business houses do 
not understand the uses to which their data can be put, and 
consequently do not have satisfactory statistical records. 
Moreover, those who appreciate their possible significance may 
have considerable reservation about giving over to a separate 
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department the responsibility of informing others of the weak 
places in their organizations. ^^Statistics^' are often in ill 
repute because they are considered either in themselves in- 
fallible or fallible — depending on whether they show the right 
or wrong thing — or because they are used unscientifically. 
There is almost as much science in the way statistics are col- 
lected as there is in their subsequent use, but this truth is 
rarely appreciated by the inexperienced. 

More diSicult situations in collecting data are encountered 
when information, although a matter of record, is desired about 
business, trade, or social phenomena by some one from the 
outside. The nature of the records is frequently unknown, 
and direct access to them impossible. If they are furnished 
in the original, adjustments, corrections, and interpretations 
have to be made after they are received. If their contents are 
transcribed by an informant, they have all of the limitations 
possessed by the originals; if by the agent soliciting the in- 
formation, they must be taken in the form in which they are 
found or adjusted in keeping with his idea of appropriateness. 
For an agent to tamper with original records is a dangerous 
procedure. The meaning of the facts may be confused; they 
may be wrongly interpreted and combined in ways in which 
they were never intended. 

To permit informants to transcribe their records is expedi- 
tious, but the liberty may be construed as license. In some 
instances, requests for information may be ignored, or an- 
swers given which are evasive or susceptible of different inter- 
pretations. Unless there is some check upon the information 
supplied, this method of securing data is inadvisable for gen- 
eral use. 

Where questionnaires are used and informants are required 
to fill them out, the answers to questions may be incorrect 
because the questions’ (1) are misunderstood, (2) call for in- 
formation about which little or nothing is known, and (3) use 
units of measurement which are unfamiliar. Long explana- 
tions cannot conveniently be made upon questionnaires, and 
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if they are supplied, no* attention may be given to them. Only 
when informants feel obliged to answer questions, or where 
answers may be thoroughly checked can complete reliance be 
placed in information supplied by schedules which informants 
themselves have filled out. In the investigation into “Wages 
and Regularity of Employment in the Cloak, Suit, and Skirt 
Industry, etc., in New York,^^ the information, supplied upon 
1429 schedules filled out by the workers and gathered by the 
- shop chairmen, was found to be “so full of errors that they 
were discarded as entirely unreliable.^^^ 

So much for a consideration of the problems of securing 
information, which is made a matter of record, when it is 
assembled by those within an industrial or other business, 
or when collected by those from the outside. 

On the other hand, information is frequently desired about 
conditions which are changing. Each time it is wanted, the 
phenomena with which it is concerned must be separately ob- 
served. The following are illustrations: inventory stocks on 
hand, the population of cities, people passing a given corner, 
daily receipts of cattle at Chicago, etc. To secure such aggre- 
gates, the instances must be counted, the act being repeated 
each time data respecting them are desired. Records of past 
events may have a certain significance as tests of the accuracy 
of a given enumeration, but of and by themselves, they do not 
supply the information that is desired. A photograph, as it 
were, must be taken of the phenomena at the time in question 
and for the area or conditions involved. 

The nature of the problems involved in a count will be evi- 
dent from a consideration of a typical case. An example in 
which counting is required, is the enumeration of the popula- 
tion of the United States’. The excess of births over deaths, 
together with the surplus of immigration over emigration, are 
the sources making for an increase of population. Reasonably 
accurate statistics of births and, deaths are restricted in the 

^Bulletin of the U. S. Bureau of Labor Statistics, Whole Number 147, 
p. 14, Washington, B. 0., June, 1914. 
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United States to the so-called registration area. Statistics 
of immigration and emigration are reasonably accurate for the 
country as a whole. Statistics of distribution of immigrants, 
more accurate than possibly the state to which they declare 
they are bound, or of the origin of the emigrant, more definite 
than his last place of residence, are not available. Little or no 
record is kept of migratory movements of population within 
the country. The result is that for statistics of population, 
reliance must be placed in the decennial census made by the 
United States Bureau of the Census. 

The actual enumeration of the population of 110,000,000 
people in a district as large as the United States is a gigantic 
undertaking. Even if the tendencies for districts to exag- 
gerate their figures and for enumerators to pad their lists 
in order to increase their remuneration are ignored, the diffi- 
culties are almost insuperable. Coupled with these condi- 
tions, and serving the political purpose which a census does, 
little value so far as absolute or even near accuracy is con- 
cerned can be attached to it as an actual enumeration or count. 
With the reasons for this state of affairs, attributable as it is 
to the method of appointing enumerators, to the inherent size 
of the task, to the divided duties of the enumerators between 
a population census proper and an agricultural and occupa- 
tional survey, to the political purpose which it serves, etc., 
we are not here particularly concerned. Our chief interest is 
in the method rather than in the accuracy of the data col- 
lected. Questions involving the determination of legal resi- 
dence, the treatment of floating population, of people in transit 
from place to place, etc., are involved in the process of 
counting. 

In the case of a population census, partial checks on the 
accuracy of the count are found in the preceding censuses, in 
the records of deaths, births, immigration, emigration, and 
in the fact that normally the distribution of age and sex 
classes is essentially uniform from period to period (this rela- 
tionship is somewhat disturbed in the United States by the 
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influx and egress of mature male immigrants). These checks, 
however, valuable as they are to keep in bounds of reasonable 
inaccuracy the results of the canvass, do not, even under the 
best of conditions, lessen the inherent diflSculties of counting 
large aggregates even with approximate accuracy. The fre- 
quency of contested elections, in cases where crookedness is 
admittedly absent, furnishes another evidence of the difficul- 
ties in correctly counting large aggregates. 

Not only may actual instances be recorded and actual cases 
be counted, but the probable frequency of their occurrence or 
appearance may be estimated. Estimates may be made on 
the basis of what has occurred in the past, or on what is likely 
to occur in the future. They may be made on the basis' of 
direct material, as when expectancy of death (life tables) is 
based upon the number and conditions’ of deaths. They may 
also be made from allied material, as when call-loan rates of 
interest are estimated on the basis of bank reserves, the net 
interior movement of money upon the size of crops, the trend 
of business on the combined factors making for business dis- 
trust or confidence, the probable price of corn upon the price 
of wheat, etc. Indeed, in the business world most dealings 
are hazarded upon the ability to foretell the most probable 
results from a given set of conditions. Market prices of 
cereals’ are, in large part, a reflex of the likely condition of 
croppage during the subsequent six or twelve months balanced 
over against the likely conditions of demand; prices of securi- 
ties are based upon an estimated earning capacity of the prop- 
erties floating them; increases of investment are hazarded 
upon a continuance of favorable trade conditions, or the fav- 
orable disposition of the legislature, etc. 

Much of the statistical data regularly compiled on the agri- 
cultural outlook; on the depletion or conservation of resources; 
upon national wealth and its distribution; upon the benevo- 
lence or malevolence of a given state policy toward business 
and industry, or the likely consequences of the adoption of a 
regime of Socialism or government ownership; upon the dele- 
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terious effects of a given work policy or condition upon the 
laborer, etc., are estimates. Some of the data are sufl&ciently 
accurate for all practical purposes, are compiled under condi- 
tions which tend to give them value' — since absolute accuracy 
is unnecessary — and may serve as bases upon which to formu- 
late a policy or launch a program. Such, undoubtedly, is true 
respecting the data issued by the Department of Agriculture 
at Washington on the condition of crops, on the acreage of 
cereals, etc. Absolute accuracy is not required, and the amount 
of error, tending as’ it does widely to distribute itself and to 
remain essentially the same from period to period, is not a seri- 
ously disturbing factor. 

On the other hand, estimates made respecting conditions 
which constantly change, and upon which adequate data as 
guides do not exist, or which in themselves are impossible of 
determination, have serious’ limitations. Too free use should 
not be made of them in shaping governmental or business poli- 
cies and in questioning social and economic institutions. The 
estimated amount of arable land in the United States is mate- 
rially increased by the completion of irrigation projects and the 
perfection of dry-farming methods. Power sites available are 
increased in number and value by the perfection of high-power 
transmission apparatus, and the available supply of precious 
metals, by the discovery and use of the cyanide process for 
separating gold from crude ores. The estimated fuel supply 
takes on new significance in the light of recent discoveries 
respecting the cse of oils and the perfection of internal com- 
bustion engines. The partial displacement of the steam by 
the gasoline engine puts in a new light the consequences which 
are sometimes associated with an estimated rapidly diminish- 
ing fuel supply. 

We are, however, not concerned at present with the conse- 
quences of a condition, tl^e facts about which are arrived at 
largely, if not wholly, through estimates, but rather with this 
method of numerically describing such condition or tendency. 
Attention is simply called to the fact that a very large pro- 
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portion of statistical data currently collected by government 
and private statistical bureaus is nothing but estimates. 
They may be good, bad or indifferent; but this does not now 
concern us. They should, however, be used as estimates, and 
the limitations of the methods under which they are collected 
be fully understood. 

Whether recorded information is used, or counts or esti- 
mates made, depends upon the problem in question, the nature 
of the data needed, and the form in which they occur. In 
these respects, each problem will be differently handled. De- 
scriptively, the methods differ. 

• 3, THE COLLECTION PROCESS PUNCTIONALLy CONSIDERED 

In collecting data — irrespective of the type which is de- 
sired and the precise methods which are used to secure them 
— there are, however, conditions which have universal appli- 
cation. There is a fundamental technique of use, usable in 
all cases. It is a functicm of all methods, although it is descrip- 
tively different in each. It has to do with (1) the source of 
material, and (2) with the manner in which it is secured. 

(1) Who are to be Canvassed? 

As soon as the purpose of a statistical study is stated, the 
following question immediately arises: From whom and in 
what way shall the data concerning it be secured? The first 
problem, stated in another way, is: Who shall be canvassed? 
A preliminary answer to this question can be given by a 
hurried survey of the problem and an inspection of the sources 
available. A complete and definite answer is possible only 
after a list of the possible sources’ of information has been 
made and the types of the informants, together with. the char- 
acter of the material which they possess, determi4ed by care- 
ful study. To illustrate: If the problem is to fix k r^onable 
minimum wage for gainfully emplq^^women^inquiry about 
the wage scale in use must be directs to those who clearly 



62 STATISTICS AND STATISTICAL METHODS 

fall within the group affected. If the wage is to apply to a 
single industry, then obviously there is a double restriction 
imposed. 

i Having determined the industry and the persons affected, 
however, the question remains: From whom shall information 
be secured? If the prevailing wage-rate is secured from em- 
ployers alone, objections may be raised that the returns are 
inaccurate; that all cases are not included; that the data apply 
to unrepresentative seasons'; that the money value of per- 
quisites granted are included in the wages reported; that be- 
cause of the stability of employment and the security of 
tenure, these factors are capitalized and included as a part of 
the wage or counted as equivalent to monetary compensation, 
etc. If the same facts are secured from the workers alone, the 
contention may be made that records are not kept and, there- 
fore, that the data submitted are at best estimates; that no 
cognizance is taken of other things than money wages, and 
that there are evidences in the data submitted of a desire to 
make a case. Neither source may be depended upon abso- 
lutely. In case there are irreconcilable differences in the re- 
ports or testimony submitted, rej)orted figures in the absence 
of the actual facts will have to be taken. If any of the above 
considerations obtain, they, of course, may be given weight 
in the determination of actual conditions. A single source is 
not always adequate; it is frequently necessary and desirable 
to use various sources in order to get the facts and to see them 
in their correct light. 

Again, if the subject of study is budgets of workingmen's 
families, such questions as the following will have to be an- 
swered: Who are workingmen? Who shall be included and 
who excluded in a particular study? What national, racial, 
customary trade, occupational, and wage boundaries shall be 
set up? How many budgets can be secured? How many are 
needed and what periods must they cover in order accu- 
rately to characterize the situation? How wide must the 
survey be to be typical of the group or class? Such ques- 
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tions cannot be answered offhand. The way in which they are 
asked and the use which is made of the answers received re- 
quire careful consideration and the use of keen judgment and 
sound statistical sense. 

In order to measure the effects of a law which requires all 
employers of five or more persons to report industrial acci- 
dents to a central authority, and to make conditions of labor 
safe by the adoption of adequate safety devices, it is neces- 
sary to know who are affected by its provisions. Failure to 
comply with a law cannot be made punishable when the sup- 
plying of blanks for reporting accidents and recording the 
installation of safety devices, for instance, is made a condi- 
tion of the law’s operation, and this’ the administrative board 
has failed to do. In the administration of such laws, one of 
the most difl5cult problems is the preparation and current cor- 
rection of lists of those to whom the law applies. A statistical 
statement of the results accomplished or of the conditions ob- 
taining in industry is impossible without a determination of 
those who are affected. 

Not infrequently, conditions of time, money, and organiza- 
tion require that sources of information be omitted or that 
typical facts alone be presented. The problem then becomes 
one of sampling. What shall be used and what omitted? An 
index number of prices may be materially affected by the omis- 
sion, or by the too frequent use of a given commodity or of 
certain types of commodities. The reasonableness of a court 
decision, or of an administrative ruling as to what constitutes 
a 'Tair return” upon railroad property, may hinge upon the 
inclusion or exclusion of certain representative railroads. The 
omission of an important sale, under the sales method of real 
estate valuation, may affect the value given to real estate in 
a given district. In the determination of a unit-value for urban 
land, how much importance shall be assigned to corner influ- 
ences, to frontage, and to relative position? Small deviations 
from the standard usually employed may make a large differ- 
ence in the value assigned. The area included may be too 
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large, conditions may not be homogeneous, and the resulting 
unit-value not be typical. The problem is essentially one of 
judging the conditions to be included, and of determining the 
weight to be assigned to each controlling factor in order that 
the sample may be truly representative. 

Who shall be canvassed, and what conditions shall be in- 
cluded, depend in large part upon whether samples will suflBce, 
or whether all data are necessary for an adequate picture. 
If it is* decided to employ samples, care should be used (1) to 
distribute them over as many categories as are represented 
in the complete data, (2) to include them in proper propor- 
tions, and (3) to guard against an undue emphasis being given 
to any particular quality or feature peculiar to a given type 
or class’. 

Comparatively few workingmen^s budgets, if accurately kept 
and reported, will serve to give a correct picture of the cost 
of living.^ It is unnecessary to include all individuals of the 
class considered. The Bureau of Statistics in Massachusetts 
maintains that the returns from representative manufacturing 
establishments are superior to those which would be secured 
if returns from all establishments were included. What is 
desired, of course, is not a record of capital employed, wages’ 
paid, etc., for all establishments, but only for representative 
ones. On the other hand, in the collection of statistics of 
trade union membership and the amount of unemployment, it 
is necessary to get totals for all unions. No reasons exist for 
the use of samples — ^the statistics are meant to be inclusive. 
If they are not, the only alternative is an estimate upon the 
basis of the incomplete returns. 

Functionally j such questions as those just presented apply to 
all problems upon which data are to be collected. The precise 
methods used to secure the facts vary. Descriptively^ they 
are dijjerent; functionally, they are the same. 

® an interesting discussion of sampling, see Livelihood and Poverty^ 
bjr Bowley, A. L., and Burnett-Hurst, A. E., London, 1915, Chapter VI, 
pp. 174-185. 



COLLECTING AND EDITING STATISTICAL DATA 65 

y 

(^) The Ways in which Primary Data May be Secured 
a. Personal Interviews 

In business and in some social surveys it is a common prac- 
tice to secure information by interviews. Personal contact 
is established and the required data are solicited first-hand. 
Whenever this method is employed its success depends’, among 
other things, upon 

(1) the sanction possessed by the person making the interview. 

(2) the personal qualities of the interviewer — ^his tact, diplomacy, 

courage, and intellectual curiosity. 
j[3) the degree to which he understands 

(a) the problem upon which he desires information. 

(b) the psychological and instinctive reactions of those whom 

he interviews. 

(4) the accuracy with which he 

(a) interprets the information supplied. 

(b) records or remembers the facts submitted. 

(5) the form of record upon which the answers are put. 

b. The Use of Form or of Personal Letters 

Success or failure will attend the efforts of those seeking in- 
formation by the use of form or personal letters in proportion 
as they 

(1) inquire of those who have the desired facts. 

(2) are definite and precise in stating what is wanted. 

(3) ask for data which are a matter of record rather than of opinion. 

(4) formulate their inquiries in such a way that the units in which 

the data are measured are 

(a) the same as those which are currently used. 

(b) not overlappmg. 

(c) simple rather than being composite or expressed as |:atios 

(5) are able to overcome the natural indifference and reltictance 

give information which is ^ 

(a) confidential. 

(b) difficult or costly to assemble. 

(c) of use to active or potential competitors. 
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(6) are able to reciprocate in some way or make their finding ac- 
cessible to all. 


c. The Form, Use, and Editing of Questionnaires or 
Schedules 


Questionnaires may be used either with or without personal 
interviews. Used in either way, they constitute (1) the list 
of questions which are to be asked, and (2) the form upon 
which the answers are to be recorded. If they are personally 
distributed, and their filling in is done by the agent himself, 
or under his direction, the purpose and nature of the inquiry 
to which they relate, as well as the terms used, may be ex- 
plained, doubtful points cleared up, and corroborative ques- 
tions asked. If, on the other hand, they are distributed by 
mail and filled out without assistance, then they themselves 
must carry conviction, be self-explanatory, consistent, and 
persuasive. Personal appeal for information, best made by 
human contact, is then made through the printed rather than 
the spoken word. Of course, objections to giving information, 
indifference and apathy on the part of those having the de- 
sired facts may be dispelled by personal contact in advance 
of the distribution of the schedules. But this is rarely pos- 
sible. Those who have the information are generally too nu- 
merous and too widely dispersed to be influenced in this way. 
Complete reliance must be placed in the questionnaire itself. 
Since this is necessary, the only way in which the end may 
be accomplished is to make the questionnaire adequate for 
the purpose. Accordingly it is well to observe the follow- 
ing principles of schedule making: 


(1) Assurances should be given that the inquiries are made 
to the provisions of law, or if voluntarily undertaken, 
^*h th^ %^e of throwing light on some particular problem, 
for ft*^ing the inquiries, and for making them of the 
infonn^ts, should either be stated or be clear by 
Infonnbnts generally demand assurance that the 



COLLECTING AND EDITING STATISTICAL DATA 67 


law requires answers to be made, or that the purpose sought 
to be accomplished has some really vital end. 

(2) Questionnaires should be accompanied by stamped en- 
velopes for return. 

(3) They should be as brief as is consistent with the pur- 
poses which they are to serve, and the questions asked should 
unmistakably be addressed to the problem. So far as pos- 
sible, the significance of each question should be evident from 


its context. 

(4) Units of measurement should be clearly indicated, be 
accurately defined, and conform to common usage. Defini- 
tions and explanations should appear in the body rather than 
at the beginning or the end of schedules. 

(5) Rulings and columnar arrangement should be simple 
and definite so as to guard against the misplacing of items. 
If spaces or columns are not to be used, this fact should be 
clearly indicated. 

(6) The page should not be crowded, ample space being 
provided for all answers. Related questions should be grouped 
together. 

(7) Opportunities or occasions for making false or inaccur- 
ate answers should be guarded against by having the questions, 
so far as is possible, corroboratory. 

(8) As a rule, the making of arithmetical calculations as 
totals, percentages, etc., should be reserved for the statistical 
organization, and not intrusted to or imposed upon informants. 

(9) Questions should be simple and unmistakable as to 
meaning, should not allow of evasive answers or of double 
interpretation, should not be unduly inquisitorial, should be 
arranged logically and in the order most convenient for the in- 
formant, should not involve duplications, should be capable 


of being answered by “yes” or “no,” by number o|* ardbuntj 
and should always be courteous and diplomatic in tohe. 

The sending out, returning, and editing of 
raise some interesting problems which call for ?;)rief coj^ ^ 
tion. As a rule, all questionnaire>s should be sont 
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same time. If this is done, it will tend to allay a suspicion 
which may arise in case one of a group receives his’ copy in 
advance of others. He may feel that he is being singled out for 
special inquiry. Moreover, the simple expedient of sending 
out questionnaires simultaneously tends to guarantee against 
their being late in returning, and interfering with the process 
of tabulation and analysis. If returns come straggling in, it 
is often difficult to know when to “close,” and what to do 
with late returns. Repeated requests may be made for infor- 
mation, but the amount of pressure which can be applied in 
case of a failure to report, as well as the success which will 
attend such efforts, will depend upon (1) the importance as- 
signed to a given return or to additional information, (2) the 
mandatory power possessed by the inquirer, (3) the degree 
of co-operation which obtains between the informant and the 
person or organization seeking the information, and (4) the 
period available for delays, and the position arrived at in 
the process of tabulation and analysis. 

When schedules are returned, whether this is done by in- 
formants, or by representatives of the collecting agent, a 
certain amount of checking, editing, and revising is neces- 
sary before they can be accepted and tabulation begun. If 
agents of the collecting unit send them in, they will be uniform 
in most details’, and occasions for correspondence and per- 
sonal interview regarding the meaning of certain entries obvi- 
ated. The services of agents in such cases will have been 
used in making the entries rather than in correcting and ad- 
justing them after the schedules are received. 

Evident errors due to omissions, additions, false entry, con- 
fusion of items’, etc., can be readily corrected. Undue tamper- 
^ing with the facts, however, is dangerous. Alterations should 
Vmade only in cases of unmistakable error. It is an easy 
^^|r materially to change the meaning and to distort the 
answers by the interchange or erroneous correction of 
will to deceive may not be present at all, 
same'^r^sults follow as if it were. If questions 
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have been uniformly misunderstood, the basis' for change is 
certain. If, however, the relations between items are made to 
agree with what in the editor^s opinion “ought” to be the 
case, then the data are used merely to support individual 
opinion. 

The degree to which omissions may be allowed or error 
countenanced is also of importance. If entries tend unmis- 
takably to confirm an ascertained fact, and the samples are 
representative, then the omission even of a number of ques- 
tionnaires may be tolerated. If, however, the evidence is 
uncertain or conflicting, the trend or the relations being 
-^indefinite, then the omission of an item in a comparatively 
few cases may be a serious matter. It may be that these are 
the very items which are needed to decide the cas'e in point. 
No rule of tolerance can be formulated which will cover all 
such cases. If the range for discrimination is wide, or dis- 
cretion given too wide a latitude, final results may be deter- 
mined quite as much by the judgment of the editing official 
as by the data themselves. 

Many of the same considerations apply in the case of error. 
If errors tend to correct each other, a considerable degree 
of inaccuracy may be allowed. If, however, they tend to 
become cumulative, then their presence is of serious conse- 
quence and every effort should be made to remove them. 

These different aspects of editing may be illustrated by 
considering the uses of the “sales method” of determining 
real estate values. All biased errors must first be removed. 
These are interpreted to include, among other things, sales 
involving nominal considerations; transfers between relatives; 
and land contracts or other conditions which in any way cloud 
the titles. Only sales between ready and willing buyers, and 
ready and willing sellers, and accompanied by full warranty 
deeds, are held to be valid for this use. By eliminating 
“doubtful” sales, however, the number actually available as 
a basis for deciding what the value is in a particular district 
may be inadequate. If this occurs, then shall sales made 
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between relatives, when the values represented by them essen- 
tially agree with the findings when they are omitted, be in- 
cluded? Provided the value thus determined is warranted, 
to use them would tend to confirm the value arrived at on the 
basis’ of other sales. If it is not warranted, then their inclu- 
sion supports a conclusion which in and of itself is incorrect, 
and weight would need to be given to the conditions under 
which the sales were made. Their inclusion, on the other hand, 
may materially change the values assigned to a given dis- 
trict, and yet, from the evidence available, it may be clear 
that they represent true value. The only consideration 
against their use is the relations of the grantees and grantors 
— ^relations which normally would make it inadvisable to use 
them in order to determine land values. 

Moreover, how many sales are necessary to establish a unit 
value? With twenty sales, the unit value might be $100 per 
front foot; with twenty-five sales, $105, and with eighteen 
sales, $95. How many sales should be included? 

Such considerations’ as these are involved in every statis- 
tical problem and in the collection and use of statistical data, 
no matter whether they apply to land valuation, price de- 
termination, studies of wages, cost of living, or what not. To 
edit primary data requires sound judgment and keen dis- 
crimination. 


IV. Conclusion 

This chapter has had to do with the collection of primary 
data and with their preparation for use. The discussion is 
intended primarily as a manual of instruction rather than as 
an encyclopedic treatment. If the points of view developed 
are kept constantly in mind, and there is real desire to profit 
by them, subsequent steps will be easier and the reader will 
have the assurance that he is employing in a scientific man- 
ner a delicate, though frequently abused, method of induction 
— statistical methods. 

The personal element stands out as an important factor in 
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all that has been said. Statistics do not answer questions or 
support conclusions independently of those who manipulate 
them. Judgment, candor, and integrity are necessary at every 
step. One must know the field in which he is working, its 
statistical possibilities, and what has been done. He must 
also realize the difficulties under which data are collected, the 
precise manner in which they are to be used, the sources and 
possibilities of error and bias, etc., and the ways of detecting 
and eliminating them. In a word, he must understand what is 
involved in the preparation of an intellectual tool, and then 
in the light of his knowledge use it intelligently for the pur- 
pose in mind. If it is faulty, he should know and acknowledge 
it. If it is well fitted for his purpose, that fact should be evi- 
dent in the uses which are made of it. To be a good statis- 
tician one has to be more than a technician, but technique 
cannot be ignored. 
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CHAPTER IV 


UNITS OF MEASUREMENT, OF ANALYSIS, AND OF 
PRESENTATION IN STATISTICAL STUDIES 

Passing from the more general statement of the methods 
of collecting statistical data, and of the principles involved in 
the collection process, the significance of such expressions- as- 
units of measurement, of analysis, and of presentation will be 
clearer if they are discussed separately in connection with con- 
crete problems. This is done in this chapter. 

L The Meaning of Statistical Units of Measurement 

The statistical approach to a subject is numerical Things, 
attributes, and conditions are counted, totaled, divided, sub- 
divided, and analyzed. It is concerned not with single in- 
stances or with rare occurrences, but with aggregates.^ The 
statistical process requires both analysis’ and synthesis, numeri- 
cal preponderance being the chief basis for conclusions based 
upon such aggregates. 

Statistical frequencies or amounts relate to units of meas- 
urement which are characteristic of the things or conditions 
studied. It is not 1000 as an abstract unit, but 1000 farms, 
industrial establishments, loans, and mortgages, which are 
considered. Abstract numbers or frequencies, on the other 
hand, may be combined, separated, and divided in an infinite 
number of ways because they are homogeneous. They are 
quantitative symbols only. Amount or size merely indicates 

^“Statistics * * * does not deal with a single homogeneous mass but 
with a complex body composed of multitudinous units differing in form 
and action one from the other ; and it is with the complex not with 
the units that it is concerned.” Bowley, A. L., Elements of StatutieSf 
King, London, 1907, p. 262. 
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the presence or absence of a condition which is’ abstractly- 
represented. Thus, units of length, width, and volume, con- 
ceived of in this manner, may be added, subtracted, or other- 
wise treated numerically as fancy dictates or necessity de- 
mands. This is done without any attention being paid to the 
units to which the symbols apply. They do not have to be 
adjusted to each purpose for which they are employed. For 
instance, a linear foot, as' an abstract unit, is always 12 inches, 
a meter 39.37 inches, an American gallon 231 cubic inches- 
They may be combined with like units and converted into 
terms of each other without any serious inconvenience or risk 
of misunderstanding or confusion. 

The same cannot be said of units of measurement dealt with 
in statistics. They are not abstract: they relate to some 
thing or condition which is’ concrete. Abstractly, all ^Ton- 
miles” are alike; concretely, they are different. While a ton 
is invariably a ton, and a mile a mile, all tons, except as to 
the one quality, weight, are not necessarily the same, nor are 
all miles, except as to the one quality, distance, always equiva- 
lent. One ton may be bulky, low-grade freight; another ton 
may be compact, high-grade freight. One may be the meas- 
ure of a quantity of stovepipe elbows; the other, of a quan- 
tity of silks. Likewise, one mile may be of easy grade in a 
prairie; the other of heavy grade in mountainous tunnels. The 
conditions necessary to the movement of one ton one mile — 
the ton-mile— may be wholly dissimilar in spite of the com- 
mon name which is assigned to the service. Statistical units 
have reference to things or attributes of things under different 
circumstances; combinations of them at will cannot be made. 
The fact is that in statistics, units of number, size, and fre- 
quency are dealt with not abstractly but concretely. 

Units of measurement having to do with business, economic, 
and social affairs are often indefinite and general By differ- 
ent people and under different circumstances, the same things 
are called by different names; or different things are called 
by the same name. Thinking and reasoning about them are 
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confused. People do not understand each other^s use of terms. 
They do not use words and phrases having the same meaning 
or connotation, and, accordingly, interpret the same phenom- 
enon in different ways or different phenomena in the same way. 

Because of this and other facts, statistical measurements 
are often meaningless. Quantitative symbols are used to meas- 
ure abundance or to indicate scarcity — ^more or less — but the 
symbols are attached to things which have different meanings. 
They are combined and averaged as though they were ab- 
stract. Confusion results when this is done. What is too 
often done is not to measure the frequencies of occurrence of 
the same thing, but of different things which are given the 
same name. An illustration involving the meaning of a unit 
will indicate the nature of the problem of statistical meas- 
urement. 

If it is necessary to enumerate the number of '^manufacturing 
establishments^' in a given district, the definition of this unit 
will obviously be determined by the following, among other, 
conditions: (1) the meaning of "manufacturing” as distinct 
from trading, mercantile, transporting, agricultural, etc., pur- 
suits; (2) the meaning of an "establishment.” The definitions 
employed will depend upon the purpose in mind in using them. 
If it is to learn the number of such enterprises, and the test 
of identity is separate ownership, there may be many or few 
"establishments.” If other tests, such as independent opera- 
tion, unit housing, imit processes, unit management, contigu- 
ous location, etc., are imposed, then different numbers of "es- 
tablishments” will be found. In such cases it is not enough to 
maintain that an establishment is an establishment. The 
identity, and therefore the number to be enumerated, depends 
upon the criteria which are used to distinguish them. The 
statistical process of grouping and combining individual in- 
stances into aggregates and of averaging them is impossible 
unless the units enumerated are identical in the particulars 
chosen as a basis for enumeration. 

Another example of a somewhat different type may be given 
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in this connection. It is desired to determine the ‘^industrial 
accident rate’^ in a given industry as a basis for fixing a scale 
of compensation for accidents. What is an “accident”? Obvi- 
ously, the reason for compensation is personal injury with its 
attendant consequences, and it is the character of the injury 
which serves as a basis for enumeration. All injuries involv- 
ing a loss of any time, howsoever slight, might be thought 
worthy o? inclusion. But since compensation is the occasion 
for determining the number, only those injuries to which an 
appreciable loss of time is due should be included. What is 
an “appreciable” loss of time? To an individual who experi- 
ences the loss, such an amount might be any time, howsoever 
•slight. To the employer, however, who advances the compen- 
sation, and to the public who finally bear it, a period of one 
or two weeks might be thought to be the minimum compens- 
able period. But many trifling accidents may, in the aggre- 
gate, occasion a far greater loss of time than a single or a 
few serious ones. There would be no hesitancy about count- 
ing the serious ones, yet there might be doubt about including 
the minor ones*. But it is precisely the latter which can most 
frequently be prevented, and about which information may 
be desired, because precautionary measures which involve little 
added cost to the employer, increased eflficiency to the em- 
ployee, and the gradual elimination of the occasion for com- 
pensation, may be taken to reduce them. 

Moreover, by hypothesis, only industrial accidents are to be 
compensated. When accidents are enumerated for this pur- 
pose, self-inflicted injuries, as well as those occurring to work- 
men while not engaged in industrial operations, and when 
work done is not a proximate cause of injury, should be elim- 
inated. Is “disease,” contracted directly as a result of the 
conditions of industry, an “accident”? Surely it is an “in- 
jury,” and if injury is the basis of compensation, ought not 
diseases contracted in this way to be counted? If they are 
counted as an industrial injury (not “accidental,” but charac- 
teristic or regular) , how should instances involving impairment 
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of health, mental or physical ability, be considered? How 
long a period must elapse before a condition, the result of em- 
ployment, ceases to be checked against such employment? 
What is an industrial accident for compensation purposes? 

The unit of measurement, however, is the rate of industrial 
accidents. Not all occupations are equally hazardous, and to 
refer to industries the accidents occurring, irrespective of the 
occupations involved, is equivalent to assigning them to condi- 
tions which they cannot produce. Moreover, the number of 
accidents which occur is a function of the number of persons 
exposed to risks and the periods of exposure — ^the man-hours 
or man-days. In using reported accidents as a basis' for com- 
pensation, care, therefore, must be taken to assign the results 
to conditions which produce them. 

If the purpose in enumerating industrial accidents were, on 
the other hand, to measure the total amount of time lost 
through mental or physical injury, obviously all accidents’ and 
all diseases directly attributable to industry should be in- 
cluded. If the purpose were alone to secure information to be 
used as a basis for removing the conditions causing accidents, 
or for assigning responsibility for them as between employer 
and employee, machine and injured person, those which were 
trivial, from the point of view of the individual, would take 
equal rank with those which are called severe. What is an 
^^industrial accident^^? 

Inquiries similar to the ones suggested respecting accidents 
must always be made and answered before the collection of 
primary, or the use and analysis of secondary data respecting 
any problem, is begun. It is not sufficient to study mere fre- 
quency, but frequency relating to the units chosen, and the 
units in their particular applications to the problems under 
consideration. 

To formulate the purposes for which statistics are to be 
collected and used is the first step in statistical studies; rigidly 
and unmistakably to define the units of measurement in 
which the aggregates are expressed, and to adhere to them 
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throughout the process, is the second. The latter is governed 
by the former, as the former is determined by the latter. The 
two are reciprocal. Statistical units cannot be defined with- 
out regard to their purpose, and their purpose cannot be out- 
lined with sufficient accuracy to be carried out without a clear 
notion of the units. 

Probably enough has been said to bring to the reader's at- 
tention the meaning of units of measurement and the distinc- 
tions which must be made between the use of abstract con- 
cepts of mass or frequency in mathematical calculations and 
the use of the concepts in statistical studies. Statistics in- 
volves more than numbers and quantities. It is quantita- 
tive but has to do with more than numerical computations. 
It is concerned, as has been said, with the processes and 
methods of formulating and testing conclusions from premises 
resting solely upon numerical bases, 

II. Statistical Units of Measubement Classified 

AND De^CEIBED 

It will be of assistance in understanding units of measure- 
ment to classify the different types and to describe their sig- 
nificance. Distinction should be made between (1) units of 
enumeration or estimation, and (2) units of analysis and in- 
terpretation. 

The first are those in which measurements are made; the 
second are those in which they are compared. The first have 
primarily to do with collecting data; the second with com- 
paring them. 

1. UNITS OF ENUMERATION OB ESTIMATION 

The units in which data are enumerated or estimated are 
either simple or composite, A simple^ unit is one which is 
general in meaning, class differences only being distinguished. 
Examples of such units are the following: a farm, a ton, an 
^See the discussion, supra, pp. 35 - 36 . 
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accident, a strike, a lockout, an immigrant, a room, a street, 
a draft, a bill of exchange, a deposit, a novel, a citizen, etc. 
Such units are easily distinguished; they are mutually exclu- 
sive. No distinction is provided for degrees of similarity, but 
only for absolute differences. Such units have no limiting 
qualifications. 

In contrast with simple units are those which are called 
composite} Composite units are formed by adding to simple 
units a limiting or qualifying word or phrase, the effect of 
which is (1) to define more accurately the general concept, 
(2) to restrict the class which it names, and (3) to add to the 
difficulty of defining it. For instance, a ^^sale,” as’ a simple^ 
unit, becomes composite by adding to it the limiting word 
'^credit.’^ The unit is now a “credit sale.” To identify it, 
it is necessary not only to distinguish the condition of “sale” 
from that of purchase, for instance, but also to define what is 
meant by the term “credit.” The simple unit “ton” becomes 
a composite unit by the addition of the word “freight.” Sim- 
ilarly, an “accident” becomes an “industrial accident,” etc* 

To convert simple into composite units sometimes has the 
effect of changing the meaning and use, as well as the scope, 
of the term. For instance, the unit “room,” in a survey con- 
ducted solely to determine the size of rooms in tenement build- 
ings, might be defined as any portion of a house, habitually 
used as a place of abode, set off by walls with exits either 
closed or capable of being closed. To add to this unit the 
limiting word “sleeping” suggests so many considerations re- 
specting light, ventilation, size in respect to number of occu- 
pants, time of occupancy, etc., as to alter materially the 
meaning attached to it when the counting is’ undertaken to 
determine size, but not size in connection with me. 

To repeat, statistical processes are not confined to counting 
or combining abstract units’, but have to do with those relat- 
ing to particular circumstances and particular problems. For 
instance, it is desired to compare the illiteracy among Southern 

*See the discussion, supra, p. 36. 
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European immigrants and the American negroes. It would 
be clearly an error to make this comparison until the meanings 
of “immigrant” and “negro” were definitely settled, until com- 
parable sex and age classes were specified, and until the same 
or comparable tests for determining illiteracy were employed. 
Illiteracy tests established for immigrants may not have been 
the same as those used for negroes. The tests for the immi- 
grants may not have been adjusted for the different age classes, 
nor determined according to standards characteristic of the New 
World. Moreover, they may have been influenced by the stand- 
ards used to distinguish immigrants from non-immigrants. 

^ JThe point emphasized is the necessity of reducing the con- 
ditions’ in every unit to a homogeneous basis. Those which 
are conflicting and overlapping cannot obtain. This applies 
particularly to cost accounting where it is necessary that cost 
data be reduced to their most elemental units. If composite 
or compound units are used, comparisons, except under the 
most favorable circumstances — circumstances which seldom 
if ever exist — are meaningless. This contention is brought out 
in the following citation relating to the use of cost units in 
New York City. 

'An example of the weakness of the usual cost data is shown by 
the cost per square yard for certain pavmg work done by five differ- 
ent gangs under different foremen. I have in mind a single day’s 
work for these gangs. The work to be done was identical yet the 
cost ranged from $1.11 per square yard to $1.89. This cost data was 
worthless on its face because it did not analyze the cost into the 
constituent elements. It accepted the compound unit cost as final 
By going back of the unit cost per square yard we find the reason 
for the difference m cost for domg the same thing under similar 
conditions We base everything on elemental^ cost data. By this 
is meant the unit cost of each element that enters into the perform- 
ance of a thing as, for instance, the laying of a square yard of 
asphalt pavement. The fact that it cost only $1,70 for laying a 
square yard of asphalt pavement is absolutely useless and mislead- 
ing unless we know all of the facts entering into the cost of laying 
the pavement.” (Here follows a statement of thirty elements to 
* Italics mine. 
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be considered in making suck comparisons.) * ^ * ^The fact is that 
one square yard of asphalt may be cheap at $2.00, while another 
square yard may be high priced at $1.00. 

'Another trouble with compound ^ units cost data is that it com- 
pares entirely dissimilar things with each other. * * The number 

of square yards to be done has a marked effect upon the unit cost 
per square yard and the conditions under which the work is done 
will have an even more marked effect.” * 

2. UNITS OF ANALYSIS AND OF INTERPRETATION 

In contrast with units in which things or attributes of 
things are named, as for instance by the simple units ^^store^” 
'houses,” “sales,” or by the composite units “chain stores,” 
“bond houses,” “forced sales,” are those in which things or 
attributes of things are compared as well as named. To com- 
pare things they must be placed in relation to each other. To 
do this requires the use of ratios, or coefBcients’ ^ as they are 
sometimes called. 

Comparisons may relate to time, to space, or to conditions 
in time or space. Illustrations of ratios or coejSicients involv- 
ing these points of view will serve to make the distinctions 
clear. 


(1) Ratios or Coefficients Relating to Time 

Sales of retail stores or the wages of working men may be 
expressed in dollars, but related to days, months, or years. 
If in comparing sales, the time unit year is taken, such a period 
may be unsuitable, because, in the different establishments, 
(1) there may be a seasonal element in one line of trade and 
not in another; (2) the goods sold may have different seasonal 
characteristics; (3) the sales in one may be spread over the 

^Italics mine. 

* Adamson, Tilden, ‘‘The Preparation of the Estimates and the Formu- 
lation of the Budget — The New York City Method,” in The Annals of 
the American Academy, November, 1915, Whole No. 151, Vol. LXII, at 
pp. 253-255. 

®See the discussion, supra, pp. 36-38. 
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entire period; in another, be crowded into a few months; (4) 
the beginning and close of the year may vary. 

If wages of workingmen in different industries although 
expressed in dollars are related to days: that is, if the co- 
efficient ^^dollars per day^^ is used, comparisons may be faulty 
because (1) the days are of unequal length, or (2) the number 
of days customarily worked in a year, for instance, is different. 

Again, industrial accidents may be expressed by number or 
by severity, but related to years. Those occurring in differ- 
ent plants, however, within a given year, may vary because of 

(1) the number of days the plant operates; (2) the number of 
employes used, and the length of time they work; (3) the 
” relative hazard of each occupation; (4) the different propor- 
tions of the total force engaged in the hazardous occupations. 

(2) Ratios or Coefficdents Relating to Space 

For different states the amounts of wheat raised during a 
given season or year may be expressed in bushels. They may 
be related to 100 square miles of territory, counties, farms, etc. 
The space units — ^the denominators of the different coefficients 
— ^may be unsuitable for comparing different yields because 

(1) not all square miles, counties, or farms produce wheat; 

(2) the counties and farms may be of different size; (3) differ- 
ent proportions of the square miles, counties, and farms may 
be used for wheat production. 

Again, sales may be expressed in dollars, and related to 
hundreds of square feet of floor space. But (1) not all floor 
space is used for sales purposes; (2) the proportions of the 
total used for this purpose, in different establishments, vary; 

(3) the floor space is probably not uniformly placed with re- 
spect to floors, frontage, etc. ; (4) the types of goods sold on 
different parts of the space used for the purpose vary in price, 
at a given time, and during different seasons of the year; 
(5) different grades and proportions of the same variety of 
goods are displayed, etc. 
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(3) Ratios or Coefficients Relating to Condition 

Deaths during a given period, or for a given area may be 
expressed in numbers. They may be related to the entire 
population or to the population of the same age and sex char- 
acteristics. If the first basis is used, the coefficient. — deaths 
per 100,000 of population, for instance — is' faulty, because (1) 
all elements of a population are not equally likely to die; 
(2) the age and sex characteristics of populations in the same 
place at different times, and in different places at the same 
time are not necessarily the same; (3) the proportions of the 
total deaths from different causes may vary from period to 
period, and from place to place; (4) epidemics of the same 
duration causing deaths may not be regular in their occur- 
rence, universal in their appearance, nor equally deadly in 
their effect. 

If deaths are related to populations of the same age and sex 
characteristics, some, but not all, of the limitations of the 
cruder bases are removed. 

Again, total operating expenses of retail establishments may 
be expressed in dollars. The amounts may be related to $100 
of sales. The coefficient would then become “total expenses 
per $100 of sales” — expenses constituting the numerator, and 
sales in hundreds the denominator of the ratio. But (1) all 
expenses do not have to do with sales; (2) both expenses and 
sales in different stores result from different types of services 
rendered and goods sold; (3) the proportions of the expenses 
and the sales, attributable to different sources, vary. 

The turnover of retail merchandise during different periods' 
for stores of different size, or with different location may be 
measured. The number of turns is secured by dividing the 
cost of merchandise sold by the amount of average inventory 
or stock on hand taken at cost price. That is, a coefficient is 
employed. Both the merchandise sold and the stock on hand 
are taken at cost price. To express the numerator in terms 
of cost and the denominator in terms of sales price is incor- 
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rect because (1) cost and sales bases are not identical; and 
(2) gross margins — ^the difference between the cost of goods 
and their sales price — ^may not be uniform for different types 
of goods, nor for different merchants. 

These illustrations of coefficients or ratios relating to time, 
space, and condition will suffice to make the distinctions be- 
tween them clear. They probably do not, however, make it 
plain why some coefficients are satisfactory and others unsat- 
isfactory. This may be done by stating the general principle 
which should be followed in setting up all types of coefficients. 
The following are different ways of expressing the essential 
idea. 

1. Compare only those things or attributes of things which 
are alike or have common qualities. 

2. ^‘Always relate effects to the causes producing themJ^ 

3. The denominator in every coefficient should relate specifi-- 
colly to the condition named in the numerator. 

4. ^^The numerator should be homogeneous and the denom- 
inator should be homogeneous^ and each unit in the denomina- 
tor should bear the same potential relation to the attributes 
of the units in the numerator.” 

If these rules are not followed, comparisons break down. 
The result is that ^^crude” rather than ^'corrected’^ units are 
employed. The “crudity"' may relate to a time, a space, or 
a condition factor, depending upon the type of unit which is 
used. To correct a coefficient is to follow the principle stated. 

Comparisons relating to remote periods, widely separated 
places, or different conditions are always questionable.^ Too 

^The following cautions are of interest respecting the difficulties of 
comparing railway statistics in the United States and foreign coun- 
tries:^ “Attention is called especially to the fact that the strict com- 
parability of all the items throughout this bulletin is not assured, even 
by the greatest care in compilation. It would be an impossible task so 
to tabulate and adjust the railway statistics of a number of countries — 
differing from each other in so many respects — as to place them on a 
strictly comparable basis. Every attempt to present a comparison 
between statistics of different countries encounters practically insuper- 
able obstacles to complete comijarability. These spring from numerous 
differences in the classification of data, in the composition of accounts, 
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great care cannot be taken to make them legitimate. This is 
particularly true in the case of statistical comparisons, since 
they are numerical and seemingly exact. A numerical state- 
ment of a fact is often taken by the unwary and uninitiated, 
as sufficient proof of its absoluteness and finality, and is made 
to support predetermined conclusions or premises to which it 
has no relation. A rigid adherence in the (¥>llection of primary, 
and in the use of secondary data, to the principles here formu- 
lated respecting units, will help the reader to use statistical 
facts in a scientific manner. 

and in the organization and character of the railway service. A few 
examples will illustrate the point. 

*‘In most European countries the term ‘freight/ as employed in the ^ 
statistics of freight tonnage and freight revenue, includes a large part 
of such traffic as is carried by express companies in the United States. 
... A great part of such traffic is earned on fast freight trains along 
with what Americans designate ‘package freight.’ It is in most respects 
a part of the fast freight service, rather than an express service, as 
that is understood in the United States. Besides the question of expedi- 
ency, is the impossibility — ^since both kinds of traffic are carried on the 
same freight trains — of determining for comparison on the train-mile 
basis the freight train-miles, in the American sense of the term, that 
would correspond to the revised tonnage and revenue statistics obtained 
by eliminating this sort of express traffic. By leaving this traffic in the 
tonnage and revenue statistics for freight, the data for each country 
are at least self-consistent. 

“Differences m the character of the service affect the comparability of 
average receipts per passenger-mile and per ton-mile. In the case of 
the passenger service, practically all countries other than the United 
States and Canada offer a great variety of accommodations. And in 
those countries the cheaper accommodations, much inferior to that of 
the usual ‘day coaches’ here and in Canada, are far the more extensively 
used. As a result, the average revenue per passenger-mile is greatly 
reduced on account of the preponderance of traffic in the second, third, 
and even fourth classes. No allowance can be made for this difference 
by any adjustment . . . 

“In the case of the freight service, the railways of the United States 
carry freight to a far greater extent in wholesale lots than in any other 
country except Canada. European countries, including England, cater 
to frequent, quick delivery of small shipments. The result is a more 
expensive service and a higher average charge. Furthermore, the average 
length of haul in the United States is . . . greater than in any other 
country. A comparison of the average receipts per ton-mile from the 
freight traffic as a whole in the United States and other countries is 
thus not a comparison of receipts for quite the same kind of service.” 
“Comparative Eailway Statistics, United States and Foreign Countries, 
1912,” Bureem of Railway Economics^ Consecutive iVo. HS, Miscellaneous 
Series No. ^1, 1915, Washington, D. C., pp. 7-8. 
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III. Statistical Units of Peesentation 

Section II, immediately above, had to do with the different 
types of units in which statistical data are measured and com-' 
pared. These were classified as (1) simple units, (2) com- 
posite units, and (3) ratios or coeflBcients. But data are not 
only measwred, and compared; they are also presented. It is 
the various types of presentation umts with which we are 
now concerned. 

Age, for instance, may be measured to the nearest day, 
month, or year; size of city to the nearest thousand; and 
expense to the nearest dollar. Similarly, the composite units, 
* selling expense, cost of merchandise sold, full-time salesmen, 
freight receipts, etc., may be recorded, counted, or estimated. 
Again, coefficients may be built up accurately or inaccurately. 
Simple and composite units have to do with enumeration or 
estimation; coefficients, with enumeration or estimation, and 
comparison. All of them involve measurements; they have 
nothing to do with the manner or way in which the measure- 
ments are presented. 

Units of presentation are of three types: (1) time, (2) space, 
and (3) condition. For instance, the operating expenses of 
a group of retail meat stores may be measured to the nearest 
thousand and be presented by years, by location, by size and 
by nature of management; age may be measured to the near- 
est month, and be presented by years; heights may be meas- 
ured to the nearest quarter of an inch, and presented in whole 
inches; live stock may be counted by farms, and be presented 
by states; railroad earnings may be secured by months and 
be presented by ten-year periods, etc. 

Units of presentation involving time are crude when the in- 
tervals used exceed those to which the measurements apply. If 
earnings, for instance, are determined by months and show sea- 
sonal changes, accuracy is sacrificed by expressing them by 
years or groups of years. 

Unais of presentation involving space are crude when the 
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arem ttsed extend beyond those to which the measurements 
apply. If population density in cities, for instance, is meas- 
ured by blocks, and conditions vary in different parts of a 
city, the significance of this variation is lost by presenting 
the data by wards. 

Units of presentation involving condition are crude when 
the class limits used are so broad as not to reflect differences 
observed in measurement. If costs of doing business, for in- 
stance, vary directly with volume of sales, then they should 
be presented in groups which will disclose this fact. Or, if 
costs of manufactured goods vary according to pattern of prod- 
uct, they should not be shown alone by entire output. 

To convert crude into “corrected” units of presentation is 
to allow the peculiarities discovered in the measurements to 
be reflected in the way in which they are presented. To illus- 
trate such a process: The costs of doing business are found 
to vary with location. This fact is discovered from the meas- 
urements themselves. How shall they be presented? Ideally, 
every variation should be indicated. Practically, this is im- 
possible. Hence, areas are grouped, and cities classified ac- 
cording to size, the purpose being to select those units of 
presentation which will best reveal the peculiarities of the 
phenomena measured. 

In general, the aim is to adopt that unit of time, place, or 
condition for presentation which will give the facts vitality 
and make them serve most fully the purposes for which they 
were collected or assembled. Statistics collected without a 
well-defined purpose are seldom of much value because of the 
lack of care in their preparation, and because of the absence 
of a controlling purpose in their presentation. 

^^Science has derived very little or no benefit from the miscel- 
laneous collecting and grouping of facts without any previous no- 
tion of what they are likely to reveal. An investigation is usually 
made for the purpose of answermg a definite question, or of verify- 
ing an anticipation. With some such end in view, with some prin- 
ciple by which the classification is guided, the result usually re- 
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veals not only what is looked for, but frequently stiH more funda- 
mental characteristics * * 


Too frequently the groups into which facts are crowded are 
so broad, purposeless, and indefinite that whatever value the 
facts may have had as collected is lost by the failure to cor- 
relate the method of presentation with the purpose or function 
which they are to play. Thus death rates’ are tabulated by 
districts so large that correlation of deaths with their re- 
spective causes in detail is impossible. From an administrative 
point of view, such statistics are almost worthless. Similarly, 
causes of death are frequently tabulated in groups so broad 
^ai^d ill-defined as to make it impossible to single out from the 
groups the significant causes, and to use the statistics as a 
basis for a health crusade. Again, density of population — a 
common coefficient. — is almost worthless when assigned to so 
large a population and so diverse conditions as those found 
in cities of appreciable size.^ Density as’ a coefficient is sig- 
nificant where overcrowding is a problem. Not all sections 
of cities are capable of producing the unit of density assigned 
to the entire district, while in many sections the density 
is far greater than the single unit implies. In some districts 
density is of no significance; in others, it is precisely the unit 
which is most vital. The units of presentation should always 
be chosen with the thought in mind of making the statistics 
function. 

T aking an illustration from a more strictly economic field, a 
large part of our wage statistics, as presented for public con- 
sumption, suffers almost beyond redemption because they are 
reported as undifferentiated totals, as averages, or in groups 
so broad as to conceal the facts’ which they might otherwise 
reveal. The wages paid to a non-homogeneous class expressed 
as a total or as an average without classification is of little 
significance in throwing light on problems on which we need 


Method of Darwin A Study in Scientifio 
Method, McClure, Chicago, 1896, p. 92. 

„ ^ 9^* Rowley, A. It., The Nature and Purpose of the Measurement of 
Social Phenomena, King, liondon, 1915, pp. 40 fip. 
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light, such as the distribution of wealth, a sound basis for 
arbitration of wage disputes, standards for minimum wages, 
etc. The units of presentation are generally too broad; the 
facts are related to conditions which do not produce them. 

IV. Diagrammatic Scheme Illustrating Different Types 
OF Statistical Units — Figure 1 

»“The Measurement Point of View-* 


Statistical Units of Measurement 


XJiJts of Enumeration 
or Estimation 


Composite 


Units of Analysis and 
Interpretation 

Rati'os or 
Coefficients 


Kumber Sq. Ft Adver-' Full- 
of of tising; time 
Em- Floor Expense Sales- 
ployes Space men 


Sellinar Stock 
Expense Turnover 

per unit 
of Sales 


► Space L City 
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V. A Selected List of Units of Analysis and of 
Interpretation — Ratios or Coefficients 


The Unit 

Formuljb Used to Compute 
THE Units 

“Crude” or 
“Corrected” 

1. Number of deaths per 100,000 
of population 

Deaths 

Population 
(in 00,000's) 

“Crude” 

2. Number of deaths from 
specific cause for 
specific age group 
per 1,000 of population of 
corresponding ages. 

Deaths by cause 
in specified age group 
Population (in OOO’s) 
of corresponding ages 

“Corrected” 

3. Accident frequency rate 

Number of accidents 
Number of people employed 
(m OOO’s) 

“Crude” 

« 4 •Accident frequency rate 

Number of accidents 
Number of full-time workers 
(m OOO’s) 

“Corrected” 

5. Sales per salesman i 

Sales 

Number of salesmen 

“Crude” 

6 Sales per full-time salesman 

Salas 

Number of full-time salesmen 

“Corrected” 

7 Selling expense per hundred of 
sales 

Selling expense 

Sales (m OO’s) 

“Crude” 

8 Rate of stock turnover 

Merchandise sold (at cost price) 
Average stock (at sale price) 

“Crude” 

9, Rate of stock turnover 

Merchandise sold (at cost price) 
Average stock (at cost price) 

“Corrected” 

10. Rent per hundred of sales 

Rent 

Sales (in OO’s) 

“Crude” 

11. Rent per unit of floor space 

Rent paid for first floor 

1 ' Floor space rented 

(in OO’s) of square feet on 
first floor 

“Corrected” 

12. Working capital ratio 

Total current assets 

Total current liabilities 

“Cfiirected” 

13. Turnover of accounts receivable 

Ai-erage amount of accounts 
receivable 

Average daily sales on account 

“Corrected” 

14 Net ton-miles per loaded car- 
mile 

Net ton-miles (in OOO’s) 
Loaded car-miles (in OOO’s) 

“Crude” 

15. Net ton-miles per loaded car- 
mile 

Net ton-miles (in OOO’s) 
of specific freight 

Loaded car-miles (in OOO’s) 
of specified freight 

“Corrected” 


Why some of these units are called ^^crude^^ and others “cor- 
rected^^ the reader should be able to determine on the basis of 
the above discussion. 
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VI. Rules for the Use of Statistical Units 
OP Measurement and of Presentation 

1. UNITS OF MEASUREMENT 

(1) Refer all units of measurement to the conditions 
which produce them. Make them homogeneous, suited to the 
purposes for which they are employed, and use them with con- 
sistency and integrity. 

(2) Define clearly and fully all units which are used. 
Certain corollaries follow from this general rule: 

a. Study problems in all their aspects before defining the 

units’. Anticipate so far as is possible the difficulties 
to be encountered, and make provision, if possible, for 
others not foreseen. 

b. Define all units in the light of the intelligence of the 

informants and the character of the data to which 
they apply. 

c. Make all definitions in such a form that exceptions will 

be readily detected, misunderstanding of terms diffi- 
cult, and employment ready, and in terms and form 
characteristically employed. 

d. Establish a logical basis for all definitions. 

e. Avoid substantive or descriptive units when direct ones 

are available. 

(3) Appreciate the fact that statistics should be viewed 
functionally, and that a main source of error is in the units 
which are used in collecting and assembling data. 

2. UNITS OP PRESENTATION 

(1) Avoid “crude^^ whenever ^^corrected” units may be 
used. 

(2) Seek to have units of presentation reflect the charac- 
teristics of data which are discovered in their measurement. 

(3) Choose those units’ which are suited to the needs and 
purposes of the consumers to whom the statistics are presented. 
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CHAPTER V 


PURPOSES OF A STATISTICAL STUDY OF WAGES 
UNITS OF MEASUREMENT, SOURCES OF DATA, 
SCHEDULE FORMS— ILLUSTRATIONS OF 
METHODS 

I. The Problem in the Study op Wages Stated , 

L INTRODUCTION 

In the preceding chapters emphasis has been placed upon the 
logical order in statistical studies — (1) deciding upon the 
merits of the statistical approach, (2) outlining fully the pur- 
poses of study, (3) defining the units, and (4) assembling sec- 
ondary and collecting primary data. The relations between 
these various steps are concretely illustrated in this chapter in 
a study of wages. 

Much is now being written and spoken on the topic of wages. 
Socialists are condemning the “wage” system; social workers 
and those interested in ameliorating the condition of the poor 
are constantly urging the payment of a “living” or of a “mini- 
mum” wage. Wages’ is the bone of contention in industrial 
disputes, and by some is thought to be the ultimate source of 
all our industrial ills. Efficiency advocates are studying va- 
rious methods of wage payment in an attempt to harmonize 
the principles of industrial efficiency with the interests of em- 
ployes and thereby to enlist their support in having them 
adopted. Others are testing the level of wages in terms of their 
purchasing power either to measure their trend or to demon- 
strate their reasonableness. Still others are attempting to 
adjust to an increased nominal wage scale the prices charged 
for commodities and services in the hope of “making both ends 

02 
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meet/' To employes, wages are too low; to employers, they 
are too high. To one, they are income, to the other, costs. 
The importance of the subject in all its vagaries is sufficient 
reason for choosing it in order to illustrate certain principles of 
statistical methods. 

It has been thought best to approach the problem from the 
standpoint of a public bureau collecting data from many em- 
ployers, rather' than from the standpoint of a single employer 
assembling wage data in his own establishment. The first ap- 
proach, in a sense, includes the second, inasmuch as each em- 
ployer must organize the material in his own plant before 
filling out the schedule for the collecting bureau. Moreover, 
employers are always interested in the wages their competitors 
are paying, and the only available sources for the necessary 
facts' are the reports of public bureaus. They are likewise 
interested in the collection process, for only by a full knowl- 
edge of it are they in a position to know the meaning of col- 
lected data. The finished product is the basis for any 
comparisons which they may desire to make, and consequently 
its scope, merits, and demerits must be known. 

When employers deal with their employes in matters’ affect- 
ing wage disputes, they need information on competitive wage 
scales; when they are concerned with their position in industry 
or trade, they need to know not only their own but also their 
competitors' labor costs. 

There is another reason for approaching the problem from 
the point of view of an outsider. Units of measurement and 
types of reports are generally standardized within individual 
establishments. As between establishments, however, they dif- 
fer considerably. For this reason, wage comparisons are often 
of little value, although they are given much weight, and it is ' 
the dangers involved in making them which are here given 
particular attention. These are traceable to (1) inaccurately 
and loosely defined imits of measurement, (2) unrepresenta- 
tive, biased, and crudely tabulated data, and to (3) the failure 
to understand what is involved in a statistical comparison. In 
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order to use statistics with discrimination and integrity, it is 
necessary to have a knowledge of their source, of the interpre- 
tation given to the original entries, of the groups and combina- 
tions into which they are thrown, etc. It is with these thoughts 
in mind that so much attention, in the preceding chapter, has 
been given to units, and that in this’ one the collection process 
for a concrete problem is discussed from beginning to end. 

2. CHARACTERISTIC CONFUSIONS IN THE USE OF 
THE TERM 'VaGBS” 

The meaning of the term “wages’’ in current discussions is 
generally clear from the context in which it is used. When the 
term is employed statistically, however, its various uses fre- 
quently cause misunderstanding and confusion. Wages and 
earnings are often used synonymously without any seeming 
appreciation of their differences. Wages and wage-rates, 
nominal or money rates and real wages are used interchange- 
ably, or at least without clear distinction of the differences 
involved and the conditions upon which they rest. The term 
“salaries,” as contrasted with wages, is used to distinguish 
large and regular from small and precarious incomes, notwith- 
standing the fact that the bases chosen are in part- illogical 
when income as salary is less than income as wages. More- 
over, the criteria by which the two are distinguished are not 
standardized; the rules’ set up are not always strictly adhered 
to and statistical studies, based upon current distinctions or in 
violation of them, sometimes lead to grotesque conclusions. 
The necessity of relating facts to the conditions producing 
them, and of making comparisons involving considerations of 
time, space, or condition legitimate, are constantly being 
violated. 

The reasons for and types of confusion in the use of this 
expression will more clearly be seen by studying various pur- 
poses for which one would wish statistical information on 
wages, and by defining the limits of the term as used for these 
purposes. No attempt is made to cover all, but only those 
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purposes which bring out the peculiar statistical diflS.cuIties to 
which it is desired to call attention. 

3. BASES FOB A DEFINITION OF WAGES 

Wages are defined in current economic discussions as 'The 
income received on account of labor performed,” ^ "the price of 
labor hired and employed by an entrepreneur ^’ or as includ- 
ing "all earnings assigned to men for their work, from lowest 
piece wages to highest annual salaries’ and 'wages of manage- 
ment.' In a still different sense the term is used to indicate 
"the share of the annual product or national dividend which 
•go^s as a reward to labor, as distinct from the remuneration 
received by capital in its various forms.”^ The term thus de- 
fined is too indefinite for statistical use, yet the distinctions 
suggest the differences to which it is desired to call attention. 
The first suggests property as contrasted with service income,"' 
but does not distingush money income from real income nor 
salaries from wages’. The distinction between the wage system 
and other possible methods of service remuneration is reflected 
in the second, while the last calls attention to a use restricted 
to economic theory — namely, that of distinguishing the reward 
of labor as contrasted with the reward of landlords and 
capitalists. • 

A number of distinctions must be made in order to use the 
term in statistical studies. Wage-rates must be distinguished 
from earnings, nominal rates from real rates; and earnings 
from labor — ^wages — from earnings from all sources including 
returns from investments, rents, etc. It is necessary also to 
distinguish wage-rates from salary-rates, and wages (wage- 
rates times the period for which paid), from salaries (salary- 
rates times the period for which paid) . In converting 

Johnson, A. S., Introduction to Economics, p. 152. 

* Gide, Chas., Principles of Political Economy (Second American Edi- 
tion), p. 487. 

® Seager, H. R., Principles of Economics, p. 244. 

^Webster, New International Dictionary, 

*See Nearing, Scott, Income, Macmillan, 1915. 
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wage-rates into wages the former must be increased by the 
money equivalent of concessions and perquisites’ and decreased 
by the money equivalent of time lost for which no compensa- 
tion IS received. Money wages must be clearly differentiated 
from real wages, or ^The purchasing power of nominal wages 
measured by a constant standard.” When computing real 
wages and making allowance for concessions, perquisites, pay- 
ments in kind, and unemployment, the nominal money equiva- 
lent must be reduced to its purchasing power and added to or 
subtracted from, as the case demands, the money wages 
similarly reduced. 


4 . WAGES DEFINED 

The term ^ Vages,” therefore, will be used to suggest various 
concepts' but always with the following meanings: 

By wages, when used alone, are meant earnings in money or 
its equivalent because of manual, mechanical, or clerical labor 
service, paid according to a stipulated scale, at frequent inter- 
vals, and under conditions which make it customary to make 
deductions for short periods of time lost. This definition does 
not admit of the term being used to cover labor’s ''share” in 
contrast with the shares of capital an(iland in distribution. 

By wage-rates are meant the predetermined rates at which 
manual, mechanical, or clerical labor service is remunerated. 
Wage-rates multiplied by the period for which paid equal 
wages as defined above. 

By salaries are meant earnings in money or its equivalent 
because of responsible, supervisory, or directive labor service, 
paid according to a stipulated scale at infrequent intervals and 
under conditions which make it customary not to make de- 
ductions for short periods of time lost. 

By salary-rates are meant the predetermined rates at which 
responsible, supervisory, or directive labor service is remuner- 
ated. Salary-rates multiplied by the period for which paid 
equal salaries as’ defined above. 
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By earnings, when used alone, are meant money incomes or 
their equivalents received for labor services, without dis- 
tinction between wages and salaries. The same term, in order 
to include other income than that regularly received from labor 
service, must be accompanied by a limiting expression. 

By real wages are meant the equivalents of money wages in 
economic goods measured in terms of a constant standard of 
value. 

Some of the purposes for which statistical studies of wages, 
as currently understood, may be undertaken, and the meaning 
which the expression must have in each case will now be 
^discussed. 

5. STUDIES OP WAGES AND THE USES OF TERMS 

If the purpose of study were to approximate the effect 
which trade unions have upon wages, one would be inclined at 
first to restrict the study to wage-rates, since minimum scales 
are determined by unions in bargaining with employers. Union 
figures on wages are invariably quoted as rates and are usually 
nominal and minimal. The actual rate received is frequently 
higher than the specified minimum; in some cases it may be 
even lower. If by wages are meant earnings from manual, 
mechanical, or clerical labor service, then the effect of union 
activities on employment would have to be considered. Wage- 
rates may remain the same and still wages be materially 
affected. This fact introduces other difficulties. Are unem- 
ployment, strike, and other benefits to be considered offsets’ to 
wage losses, or are they considered to be counterbalanced by 
increased dues necessary to replenish depleted unemployment, 
strike, and sickness funds’? Union activities may seriously 
affect wages but have no influence on earnings from other 
sources. Wages, therefore, must be distinguished from earn- 
ings, if the latter are meant to include earnings from other 
than labor services. 

When ^^minimum’^ wages are discussed, wages, undoubtedly, 
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are understood to mean rates, ^ince employers are not com- 
pelled to hire labor but only to pay at least the stipulated 
minimum to those employed.^ On the other hand, when the 
term ^^living’^ wage is used, reference is not so much to the 
rate of wages nor even to wages alone from labor service, as to 
earnings from all sources’ under the conditions possible for the 
persons affected. Undoubtedly, earnings from other sources 
than labor service, in the cases of those to whom the receipt 
of a living wage is a problem, are almost negligible, yet the 
term ^'income^^ is more suitable than the term ^'wages’^ to 
describe this condition. 

In comparing wages for manual, mechanical, and clerical 
labor service by industries, occupation, districts, etc., it is 
necessary to use wage-rates instead of wages, since only the 
former are generally available. It is next to impossible to 
trace individuals’ from industry to industry and to approxi- 
mate, with any degree of accuracy for an extended period, the 
extent of unemployment, the amount of overtime worked, etc.^ 
It is doubtful if anything better than classified rates are pro- 
cured by statistical bureaus which ask for earnings. The rates 
as quoted by trade union sources are always minimal and 
nominal and, therefore, are of limited significance in deter- 
mining the economic status of the groups concerned. Those 
secured from employers are for a limited period — generally a 
week, except in intensive studies — and are not a satisfactory 
measure of earnings from labor service. Wages instead of 
rates are necessary for this purpose. The same fact applies 
in studies relating wages to efficiency, to sex, to nationality, to 

^ The order oa aiinimum wages in the brush-making industry in Massa- 
chusetts specifically takes account of the rates to be paid. ‘‘Assuming an 
average scale of 50 hours and regular employment” (a rather violent 
assumption) “this rate (15%^) would yield earnings of $7.75.” Quoted 
from “Estimates of a Living Wage for Female Workers,” by Charles E. 
Persons, in Puhlications of the American Statistical Association, June, 
1915, p. 577. 

^ For the difficulties involved even in an intensive study, see “Wages 
and Regularity of Employment in the Cloak, Suit, and Skirt Industry,” 
etc,, Bulletin of the United States Bureau of Labor Statistics, Whole 
Number 147, June, 1914, pp. 14, 41, 42, 50. 
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length of service, etc. Wage-rates are the only data generally 
available and, of course, should be used as such. 

If the determination of the trend of wages is the problem 
to be studied, wages may mean a number of things. Wage- 
rates, or earnings in the broad or in the narrow sense, may be 
considered. Study may extend to nominal or money wages or 
to real wages, and may include not only wage labor but 
salaried labor as well. If the trend of real wages — 
purchasing power of nominal wages measured by a constant 
standard” — is the object of study, rates’ and not earnings must 
be used, since it is only the former of which we are in posses- 
sion, or which we may secure with reasonable accuracy on an 
S^dequate scale. Homogeneous wage groups must also be used. 
Moreover, a logical basis for the inclusion or exclusion of 
salaries must be established, care being exercised that the basis 
of distinction is followed throughout the entire period. Noth- 
ing is here said about the price index used in making the 
conversion of wage-rates into current prices or of the peculiar 
difficulties in adjusting the index to the classes of labor to 
which the comparison applies. 

If the purpose of a study of wages were to determine from 
the producers’ standpoint the relative costs involved in labor 
service, as contrasted with rents or interest, obviously, rates 
of wages in the narrow sense used above would be too ex- 
clusive a category. Distinctions between salaries and wages 
would be unnecessary, since the purpose is merely to deter- 
mine production costs assignable to labor as distinct from land 
and capital. If the approach to the same problem were made 
from the social viewpoint, it would be necessary to distin- 
guish between wages and salaries, and on grounds other than 
those generally followed, inasmuch as those are frequently 
illogical and indeterminate. Merely to call one group salary 
receivers and another group wage receivers results in confusion 
when the economic conditions of both are similar, and when 
criteria for determining the status of one apply with equal 
force to the status of the other. There would be the same 
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reasons for accurately defining salaries as for defining wages. 
The bases for the definitions should be factors of importance in 
the study in which the units are used. It is inappropriate to 
contend that the conditions according to which the units are 
defined change with each purpose and, therefore, that such 
units are unsuitable for statistical uses. The premise is valid, 
but the conclusion does not follow. Such a claim only serves 
to bring more forcefully to mind a fact already considered, 
namely, that while abstract measures of numerical frequence 
are employed in statistical studies, they are not used ab- 
stractly but are applied to units the limits and terms of which 
are conditioned by the uses to which they are put. 

IL The Relation op the Problem as Outlined to 
Statistics of Wages 

The preceding discussion has served in a general way to 
show the necessity of accurately defining units of measure- 
ment in connection with the purposes of statistical studies, 
and to emphasize the necessary points of distinction in the use 
of such a word as ^'wages,'' but it has probably not related, 
with sufficient closeness, the subject to actual statistical data 
and suggested the problems by which one is confronted in 
using wage data possible of collection or currently collected. 
This closer relation we shall now establish by indicating the 
sources for primary wage data, by discussing the difficulties 
experienced in their collection, by describing the types of sec- 
ondary data currently collected, and finally by constructing 
wage schedules to be used in connection with a concrete 
problem. 

1. SOURCES FOR PRIMARY DATA IN WAGE STUDIES 

( 1) Primary Data Directly Applicable to Studies of Wages 

Primary data in the study of wages may emanate from four 
sources. Those secured from employes, from employers, and 
from union officials are directly applicable; while those from 
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institutions such as banks, building and loan associations, in- 
surance companies, lodges, etc., are only indirectly applicable. 

a. Data from Employes 

Data on wage-rates; hours of work (nominal and actual); 
the amount of unemployment by cause; the methods and fre- 
quency of wage payment; earnings from labor and from other 
sources; perquisites in the forms of bonuses, benefits, profits; 
penalties, fines, forfeits, union dues; budgetary expenditures, 
and facts relating to age, sex, nationality, occupation, train- 
ing, length of service, previous wages, etc., may be secured in 
whole or in part, satisfactorily or unsatisfactorily, from in- 
dividual employes, in proportion as informants are wise or 
ignorant, truthful or deceitful, willing or unwilling to aid, and 
in proportion as the statistical organization used is well- or ill- 
adapted for the purpose in mind. It is impossible to sum- 
marize in a single sentence the success attainable in securing 
data on wages or on any other topic directly from individuals 
involved. Frequently, the costs are prohibitive; in other cases, 
where cost is not an insuperable barrier, the types of individ- 
uals dealt with and the character of the information desired 
make this approach impossible. The generalization, however, 
is hazarded that data collected from a source where personal 
supervision or intimate checking is impossible are likely to 
possess serious limitations respecting all topics which in any 
way call for discrimination, for the exercise of judgment or the 
use of records, etc., on the part of the informant, or in which 
the personal equation enters to an appreciable extent. The 
discussion, in Chapter III, of The Collection Process is par- 
ticularly applicable in this* connection. 

b. Data from Employers 

Much the same types of wage data as those listed above are 
theoretically obtainable from employers, and .he chances are 
much greater that they will be free from error since less igno- 
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rant groups, recorded facts, impersonal relations, etc., are dealt 
with. The facts, however, are of a somewhat different sort and 
rarely apply to an extended period. The best that can be done 
in most cases’ is to secure cross-section views at widely sep- 
arated intervals. Moreover, for the most part, classes and 
not individuals are considered. These may or may not be 
homogeneous, and in this respect are much less desirable sta- 
tistical units than are individuals. From this source, with an 
adequate statistical organkation, and with sufficient sanction, 
the total wage bill, time- and piece-rates, by occupations and 
processes, classified wage-rates, perquisites allowed and penal- 
ties assessed, and the number of employes classified by sex, 
age, and time of employment, etc., are theoretically available."' 
The facts regularly secured on an extended scale and available 
for use are discussed below. 


c. Data from Trade and Labor Unions 

In many respects the records of trade and labor unions are 
satisfactory sources for wage data. Theoretically, nominal 
time- and piece-rates for regular, for overtime, and for Sunday 
and holiday labor; nopiinal hours per day and per week; bene- 
fits’ allowed, classified by the amounts paid, by purposes, by 
duration, etc.; union dues; numbers unemployed, classified as 
to causes, and wage losses, etc., are available from this source. 
The data, however, may have serious limitations. Frequently, 
the desire to make out a case is held to be suflScient cause for 
furnishing defective returns’ or for withholding information. 
In many instances the inquiries addressed to union officials 
concern matters about which they can have but the most in- 
adequate and superficial knowledge, and yet they are urged 
to give positive, negative, or numerical answers with few or 
no opportunities being offered for explanations. In some in- 
stances, undoubtedly, sincere efforts are made to state the 
truth as’ nearly as it can be determined; in other instances, no 
such care is exercised. The value which data from this source 
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possess is to a large degree determined by the scrutiny to 
which they are subjected by collecting agents. 

The limitations, however, are not always to be attributed 
to errors in reporting nor to incomplete returns. Frequently, 
they result from misusing and assigning finality to figures at 
best but estimates, from ignoring the specific advice of collect- 
ing agents, and from violating the fundamental principles of 
statistical methods. The same result, however, may occur re- 
specting data drawn from the most acceptable sources. Sta- 
tistical facts will be cited to prove contentions with which they 
have no connection and will be distorted and misapplied so 
long as people have hobbies, lack integrity, or are ignorant of 
the functions, limitations, and purposes of statistical data and 
legitimate ways of using them. 

It will be noted that data on wages from unions are re- 
stricted to nominal rates and to union members. These are 
serious limitations where wages or earnings are sought and 
where non-union labor is involved Such data are of little 
value in discussions of minimum wages, living wages, or other 
topics in which light is desired primarily concerning unskilled 
labor. 

{^) Data Indirectly Applicable to Studies of Wages 

Facts which contribute indirectly to a knowledge of wages 
and wage conditions may be gleaned from a study of the in- 
crease or decrease of savings, the number of depositors in 
savings institutions and the average deposit, the size of em- 
ployers' payrolls, the activities of building and loan associ- 
ations, the growth or decline of fraternal insurance, the 
increase or decrease of union membership, etc. In most re- 
spects their connection with the topic is remote and contingent. 
They are at best suggestive and corroboratory and should be 
used with extreme caution, cognizance being taken of the 
roundaboutness of their application, the potency of other con- 
tributing causes to produce the effects shown, the interrelation 
of economic phenomena, etc. 
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Having sketched the types of wage data theoretically 
available, their sources, and the difficulties in securing and 
the dangers in using them, we may now briefly enumerate the 
types currently collected with their sources and some of their 
peculiarities. No attempt is made to describe or criticize fully 
or even to enumerate all forms regularly and irregularly col- 
lected in the United States. This has been done in a general 
way by others.^ Moreover, such a treatment is not germane 
to our immediate purpose. 


2. TTPES OF SECONDARY WAGE DATA 

Secondary data on wages collected from the chief primary 
sources are available in many forms. They appear in public 
and private reports, issued on the basis of data furnished by 
wage earners, employers, and unions. Some reports appear 
regularly, some irregularly; some are restricted to the single 
topic, while others bear upon it only indirectly. Some are 
monographs on special topics, while others are exhaustive in- 
dependent surveys. 


(1) Secondary Data Directly Applicable to Studies of Wages * 
a. Data from Employes 

Wage studies, in which the material is drawn from individ- 
uals alone, are made primarily in connection with cost of living 

’ Nearing, Scott, Income, Chapter II, pp. 18-62, New York, 1916 ; 
Streightofl, F. H., The Distribution of Incomes in the U. 8,, Columbia 
University Studies, Vol. LII, No. 2, 1912. 

^ In this revision, it is thought not to be necessary to bring up to date 
the descriptive details in this chapter. The types of wage data which 
are collected, the manner in which collection is made, and the way in 
which the data are published are constantly changing. The details 
furnished, while not necessarily complete nor accurate for 1925, are 
sufficiently suggestive of the conditions which obtain. This is all they 
are intended to be. An introductory text on statistical methods is not 
intended to be an encyclopedia of statistical practices. 
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studies, such as those of Chapin ^ and Mrs. More ^ in America; 
Rowntree^ and Booth ^ in England; or as a condition of the 
administration of labor laws, such as those on compensation for 
industrial accidents. Those of the first type generally apply to 
limited territories and restricted groups, cover only a relatively 
short period, and are made in connection with or are designed 
to throw light upon budgetary matters. In those of the second 
group, wage data are subsidiary to the main purpose of study, 
are restricted to definite classes, are not collected simulta- 
neously for all groups, in some instances are semi-confidential, 
and are generally too meager to be conclusive respecting either 
lauling wage-rates or wages. Hence, they are not generally 
published except in summary form along with accident and 
other data ® They are, however, of excellent quality, because 
of the purposes for which collected, and in the course of time 
when they have been sufficiently accumulated will undoubtedly 
furnish material for thorough and comprehensive wage studies. 

Studies on wages from material drawn directly from em- 
ployes are published only at irregular intervals' and cannot 
wholly be relied upon for current information. Those associ- 
ated with budgetary matters refer invariably to wages or to 
earnings; those arising out of the administration of labor laws 
always relate to rates of wages. Those of the first class are 
important in calling attention to low wages in certain in- 
dustries, in certain districts, for limited groups, and are indis- 
pensable in the determination of minimum and living wage 
standards, but are inadequate for comparing wages by indus- 


* Chapin, Robert 0., The BtandcLfd of Ijiving A-mong WoThingmofif^s 
Pamilies in New York City, Charities Publication Committee Ne-w 
York, 1909. ’ 

®More, L. B., Wage Earners^ Budgets, New York, 1907. 

Poverty; A Study of Town Life, London, 


* Booth, Charles, Ltfe and Lador of the People, London, 1891 
®The brmf tables on wages in ‘‘First Annual Report of the Industrial 
Accident Board, Massachusetts Industrial Accident Board, Boston 
1914, and in “Report No. on “Industrial Accidents in Ohio, January 

Commission of Ohio, Columbus, 

Ohio, 1915, are illustrative. 
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tries, by localities, and over long periods. Neither do they 
furnish material for measuring the trend of wages. Those of 
the second class may be used to correlate wage losses and 
amounts’ of compensation for accidents, but at present are in 
the main superficial and restricted studies, serving no other 
purpose than that of a record of wage data collected on 
accident schedules. 


b. Data from Employers 

The statistical matter relating to wages and wage conditions 
reported and published by regularly constituted statistical 
bureaus, by special commissions’, and by individual investiga- 
tors, may be divided into two groups; those directly related 
and those remotely connected with the topic. 

(a) Material Directly Related to Wages 

Direct material relates, first, to the total wage bill paid, and 
second, to classified wage-rates. The United States Bureau of 
the Census publishes at decennial and at certain intercensal 
periods the total salary and wage payments, made during the 
year to which the census applies, to salaried officers’, to super- 
intendents and managers, to clerks, stenographers, and other 
salaried employes, and to wage earners including piece work- 
ers in manufacturing and mining industries. The Interstate 
Commerce Commission publishes monthly the amounts paid 
to railroad employes classified into one hundred and forty- 
eight classes. The same commission publishes for express com- 
panies the wages and salaries of employes in the “traffic,” 
“transportation,” and “general expense” divisions. A few state 
bureaus of statistics and labor, particularly those in Massa- 
chusetts, New Jersey, and Ohio, collect and publish, as part 
of their manufacturing censuses, the total compensation for 
labor services classified as salaries and wages. The schedule ^ 

^ Bureau of Statistics of Labor and Industries, 
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used by New Jersey calls for the *Total amount in wages paid 
during the year/^ and instructs informants that ^'only wages 
paid to wage earners actually employed” in an establishment 
or in “erecting or placing its products elsewhere” should be in- 
cluded. Salaries of managers, bookkeepers’, salesmen, etc., are 
to be omitted. The schedule ^ to manufacturers used by Mass- 
achusetts asks for the “total wages (paid during the year to 
wage earners only),” and instructs the informants to omit 
“salaries of agents, managers, bookkeepers, clerks’, salesmen, 
and others of this class.” The schedule ^ used by Ohio con- 
tains essentially the same questions and provides for the same 
omissions’, except that salespeople are divided into two groups, 
traveling and non-traveling. 

Classified weekly wage-rates are collected and published 
for manufacturing enterprises in a number of states, but most 
satisfactorily in Massachusetts, New Jersey, and Ohio. In 
those instances the data are taken from payrolls'. Massa- 
chusetts and Ohio in their schedules ask specifically for weekly 
rates, while New Jersey apparently desires weekly earnings,^ 
Massachusetts and New Jersey supplement their schedules' by 
field agents. Ohio is able to dispense with these in connection 
with her wage studies, inasmuch as, in the administration of 
her compensation law, she secures the audited payrolls of all 
employers subject to the law. It is not likely, under these 
conditions, that employers affected by the law in both re- 
spects will furnish incorrect returns. 

The most exhaustive study of classified wage-rates for the 
United States is' that on Employees and Wages made by the 
Census Bureau in 1903 under the direction of Professor 
Davis R. Dewey, and known as the “Dewey Report.” The 
data refer to the years 1890 and 1900, apply to thirty-three 
industries, but include only a limited number of establish- 


* The Buredu of StdtisticSf Division of Monufactuvers, 

® The Industrial Commission, It is not quite correct to speak of a 
“Manufacturing’’ census in the case of Ohio. 

* The data are published as “earnings” but undoubtedly are rates. 
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ments in each industry. Wages of 103,453 employes in 1890, 
and of 160,859 in 1900 were tabulated in detailed groups. 
While the study is exhaustive in scope and unique in method 
it is not of current interest and must be passed over with 
brief mention. 

The United States Bureau of Labor Statistics publishes from 
time to time special studies on wages’ and hours in different in- 
dustries. These are always of interest. Indeed, this Bureau is 
the source from which most satisfactory data may be expected. 


(b) Material Indirectly Related to Wages' 

The material indirectly bearing upon wages may be classi- 
fied under two heads, first, actual or average number of em- 
ployes by months, and second, the time which plants operate 
during the year. 

The United States Bureau of the Census publishes for 
manufacturing and mining industries the number of wage 
earners, including piece-workers, as per payrolls or time rec- 
ords, on the fifteenth day of each month for the periods’ cov- 
ered by its reports. No distinctions are made for age and sex 
classes. New Jersey, as a part of her manufacturing census, 
publishes the ^^number of persons employed’^^ during each 
month of the year for which study is made, classified by sex 
for those sixteen years of age and over, but without sex classi- 
fication for children under sixteen. Massachusetts publishes 
the average ^ number of wage earners during each month for 

* Neither the instructions to informants nor the schedules define this 
number. Whether it is to be the average force computed on the basis of 
twenty-six, thirty, or thirty-one days, to be the normal force during tho 
period, or the number of separate individuals to whom employment was 
given during each month, we are not told. It conceivably might be any 
one of them, carefully computed, but more likely it is a rough average 
representing nothing better than an estimate. 

*The use of an average in this case seems unnecessary and somewhat 
to lessen the value of the figures in computing the deviations from 
month to month, with the purpose of throwing light on the seasonal 
character of employment. There seems no sufficient reason why the 
exact number, as required by Ohio, and others, should not be called for. 



METHODS IN A STUDY OF WAGES 


109 


males and females separately but without age classification. 
She likewise publishes the number of wage earners eighteen 
years of age and over and under eighteen years of age classi- 
fied by sex on the thirteenth ^ day of December as per payroll. 
Ohio requires employers to report the number of wage earners 
employed on the fifteenth day of each month as’ per payroll, 
classified by sex but not by age. 

Ohio, likewise, requires employers to report the number of 
full days that plants are in operation and idle during the year, 
the former including part-time days reduced to a full-time 
basis and the latter not including Sundays and holidays unless 
^plants normally operate on these days. The number of hours 
normally worked per full day or shift and per full week is also 
required to be reported. In Massachusetts the number of days 
in operation and idle is included in the manufacturing schedule 
and published in this form. Informants are specifically re- 
minded that the working year is composed of a stated number 
of days and that the sum of the days reported, not counting 
Sundays and holidays, should total to this number. In New 
Jersey, data are published for manufacturing establishments 
on the number of days in operation, the normal number of 
hours per day, the normal number of hours per week, and the 
total number of hours extra time during the year in which 
establishments operate. The Bureau of the Census publishes 
like figures on the number of days manufacturing and mining 
establishments are in operation during the year and the num- 
ber of hours normally worked by wage earners per shift and 
per week. Respecting the latter topic, informants are instructed 
that ^^all that is desired to know is the practice generally pre- 
vailing in respect to the hours of labor of employes.” 

c. Data from Trade and Labor Unions 

The wage data regularly collected from union sources by 
statistical bureaus refer to nominal (minimum) time- and 

* This is the date indicated in the schedule for 1913. 
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piece-rates, nominal (maximum) hours per day or per week, 
causies and extent of unemployment, number and duration of 
strikes, etc. In this descriptive part of the chapter it will 
suffice, in view of what has been said above, briefly to describe 
the statistical activities of the United States Bureau of Labor 
Statistics, of the Department of Labor of the State of New 
Yorkj and of the Bureau of Statistics of Massachusetts, re- 
specting union wage conditions. 

The United States Bureau of Labor Statistics has published 
the union scales of wages and hours of labor for the principal 
mechanical trades, for the largest cities of the United States 
for the period 1907 to date. The report for 1913 covers the 
forty industrial cities located in thirty-two states for which 
the Bureau publishes retail price statistics. Union scales for 
both wage-rates and weekly hours are followed, but such 
scales fix the limits in only one direction. Minimum wage- 
rates are established below which members of unions will not 
as a rule work, and maximum hours beyond which they will 
not work at regular rates of pay. In certain cities and trades, 
workmen are paid more than the union scale and work reg- 
ularly less than the scale of hours. However, the Bureau takes 
no cognizance of these conditions. All wage-rates are reduced 
to an hourly basis, and for all the trades for which the Bureau 
has figures, relative or index numbers are computed for both 
wage-rates and hours for the years 1907 to 1913. The data 
are collected by special agents in personal visits to union busi- 
ness agents and secretaries, and the wage scales, written agree- 
ments, and the trade imion records consulted wherever 
available.^ 

Statistics of unions and their membership were first collected 
by New York State in 1894 and 1895. Since 1897 such statis- 
tics have been regularly published. Information is now col- 
lected semi-annually from all unions, in part by schedule and 

similar study, in co-operation with the United States Bureau of 
Labor Statistics, is made by the Industrial Commission of Ohio and 
applies to all the larger cities in the state- 
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in part by field agents. Schedules relate to membership and 
idleness, to hours of work, to new trade agreements, to changes 
in the rates of wages, and to rates of wages of time workers. 
The amount of unemployment is reported under six specific 
and one miscellaneous head; lack of work, lack of material, 
the weather, strikes or lockouts, sickness or accident, old age, 
and miscellaneous. The data apply to the sexes separately and 
to the end of March or September as the case might be. The 
regular hours of work for Saturday, Sunday, and other days, 
and the total per week by branches of trades and for the sexes 
separately are included. Changes in hours, with those before 
and after each change, and number of persons affected are 
also requested. Respecting rates of wages, information is se- 
cured on the rates before and after changes, the number of 
members affected, and the estimated weekly earnings before 
and after changes in the case of piece workers. Schedules re- 
specting wage-rates of time workers relate to each branch or 
grade of work, to the working hours per day for the specified 
rates, and to the number of members by sex receiving them. 
Other inquiries of less significance and certain modifications of 
these are also included. It is unnecessary for our present 
purposes to supply more details. 

The schedule is a model in technique; the questions are vital, 
clearly stated, and well arranged. It is mailed to union sec- 
retaries, ten days are given for answering, and delinquents are 
visited by field agents of the Bureau. Approximately 50 per 
cent of the schedules are sent in by mail and 50 per cent 
'^fielded.” 

The published material is issued in two series: one called 
^^Series on Employment^’ and the other '^Series on Labor 
Organization.” The first shows the amount of unemployment 
by cause, by months, and includes summaries for years by 
industries and by detailed trade groups. The issuance of a 
letter on the state of the labor market based upon monthly 
returns from the larger unions is also a regular feature of the 
Bureau’s activities. The second series relates to the number 
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and membership of unions classified so as to show data by 
industries, by trades, by localities, etc. 

This account of the New York Bureau^s activities respecting 
union wages and conditions, although brief and sketchy, is 
probably adequate to reveal in a general way the types of 
data collected and the manner of securing them. Neither the 
schedules nor the methods of tabulation are open to severe 
criticism. The only criticism which might be offered is that 
the facts are supplied by unions. Essentially the same facts, 
but in a different form, respecting wages, hours, and unemploy- 
ment, are available from employers and the probabilities are 
that they are more accurate when so returned than are those 
furnished by unions in spite of the care exercised to correct 
the errors. Employers are subject to state supervision in 
many respects, the statistical machinery is adjusted to this 
source of information, and the reporting of facts may be re- 
quired legally. Unions are not compelled to report nor are 
they punished for withholding or distorting the matter sup- 
plied. In one respect, however, it seems necessary to deal 
with unions as units. Public and private boards of arbitration 
require union scales of wage-rates and hours as bases for mak- 
ing awards. These facts for unions cannot be gotten from 
employers; their scales do not necessarily express union experi- 
ence. Unions must supply the material. 

The Massachusetts Bureau of Statistics in its Labor Di- 
vision collects and publishes statistics of organized labor 
relating to union scales of wages and hours, number and mem- 
bership of unions, unemployment, strikes and lockouts, etc 
Each of these will be touched upon briefly inasmuch as they 
probably represent the most accurate and complete data on 
organized labor now regularly collected by any statistical state 
bureau in the United States. 

A report on union scales of wages and hours is regularly 
issued. The data are furnished entirely by unions and are 
published as reported, no inquiry being made as to the extent 
to which the union scales prevail in the various trades and 
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localities. That is, minimum rates and not those actually re- 
ceived by union labor are published. The process of collection 
may be indicated by reference to the 1913 report. Returns 
by schedule were received from 1093 unions, or 78 per cent 
of those in the state. By the use of special agents 200 more 
were obtained, so that 92 per cent of the locals in the state 
were included. In tabulated form they show rates of wages 
by the hour, day, week, overtime (hour), and Sunday and 
holiday (hour) ; and hours of labor, by the day, week, and the 
period in which half -holidays are in effect, all classified for 
occupations and for municipalities. 

'Statistics on the number and membership of unions have 
been systematically collected and published since 1908. The 
collection is mainly by schedule and includes national and in- 
ternational unions with affiliated locals in Massachusetts, their 
relationship to the American Federation of Labor, the number 
of chartered local unions and the proportion in Massachusetts 
with their membership, classified for the sexes’ separately, by 
municipalities, occupations, industries, etc. 

Statistics on unemployment among organized wage earners 
are issued quarterly. The data are collected from unions solely 
by schedule and are published so as to reveal the amount of 
unemployment by cities and occupations due to lack of work 
or material, unfavorable weather, strikes or lockouts, sickness, 
accident or old age, and other reasons, the latter specified in 
detail. Approximately 75 per cent of the locals are included 
in each quarterly report. 

Statistics on strikes and lockouts have been collected by the 
Massachusetts Bureau since 1881. Unions and employers are 
scheduled on the basis of information supplied by newspapers', 
trade journals, etc. Besides certain preliminary data the fol- 
lowing facts are secured from unions: the names of employers 
affected, conditions demanded by strikers, conditions before 
and granted after strikes, who ordered strikes, the occupations 
and numbers of strikers (the latter by sex), the dates on which 
strikers left and resumed work and on which strikes were 
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ended, as well as the methods of settlement. From employers 
those questions’ of the above which apply and the following 
are asked: the number of employes who struck, classified by 
sex; the number of non-strikers thrown out of work, classified 
by sex; the time lost by non-strikers; measures used by strikers 
to regain their positions, etc. In approximately 50 per cent 
of the cases the returns from the two sources are so contra- 
dictory as to necessitate the use of special agents to obtain 
the facts,^ Even by this method in many cases the facts 
prove to be so indeterminate that the reports are published 
only on the basis’ of what seems to be the facts after all evi- 
dences are given their appropriate weight. These reports, 
therefore, appear to be summaries of reported or estimated 
facts concerning industrial disputes — ^knowledge of which is 
received through the press, by hearsay or by other means — 
having little value alone in connection with wage studies, and 
chiefly of interest for informational and not for functional 
use.^ 

Without citing further detail of the practices and experi- 
epqes’ of American statistical bureaus in securing wage and 
allied data from trade unions, sufficient has been said to indi- 
cate the problems and possibilities in this approach to the 
study of wages. In all cases nominal and minimum rates are 
involved and these are reported under conditions which make 
it difficult, if not impossible, to apply them to unemployment 
data in any attempt to approximate earnings from labor serv- 
ice. When properly checked by scrutinizing trade agreements’, 
nominal hours and time-rates from this source may be deter- 

^ Estimated for the writer by the Division Chief. New Jersey, placing 
complete reliance in newspaper clippings for initial information and 
depending altogether for the facts secured on schedules from unions 
alone, publishes an annual report on strikes and lockouts. If the 
experience of Massachusetts respecting like data is worth anything, 
statistics thus collected stand condemned. 

*A detailed estimate of the value of these and like data compiled by 
the Bureau is not attempted here. It was made, however, by the writer 
during the summer of 1914 for the United^ States Commission on Indu 9 - 
trial Relations, 
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mined with reasonable accuracy. Any attempt, however, to 
secure piece-rates on an extended scale from this source is 
bound to prove unsuccessful. Unemployment data from unions 
at best are approximations, and, of course, refer only to union 
labor. They serve fairly well to give a general notion of 
seasonal displacement of labor and of trade depression or boom 
but are of little value in measuring earnings or economic dis- 
tress’. Statistics of strikes and lockouts as collected may serve 
as a rough measure of the frequency of labor disturbances but 
not of their consequences nor of the correction which it is 
necessary to make for this cause when estimating wages 
from wage-rates. 

In summary^ we may briefly relate the statistical data extant 
on wages to the various concepts which this term suggests. 

Comprehensive data on wages as defined above do not exist 
in the United States.^ For annual reports for all manufactur- 
ing industries on classified wage-rates* for short pay -periods, 
where conceivably wage-rates are equivalent to earnings — 
assuming neither over-time nor time lost — ^we may turn to 
Massachusetts, to New Jersey (“earnings” in this state) , ai^d 
to Ohio.^ Studies of classified wage-rates for special indii^- 
tries are periodically made by the United States Bureau of 
Labor Statistics. In order to use nominal and minimum wage- 
rates as equivalent to wages it is necessary to assume that 
nominal conditions are actual, that figures are reported ac- 
curately, and to correct rates by figures on unemployment sup- 
plied by unions, by employers, or by employes. The reliance 
which can be placed in union figures on strikes and other 
causes of unemployment has been suggested above. The im- 
portance to be assigned to fluctuations in the employed force, 
as indicated by the average or actual number of employes at 

^ Nothing is said about our present national income tax statistics. The 
exemption allowed is so high as to omit most “wage earners,” and the 
returns are not published in a form suitable for estimating earnings for 
such groups. See Falkner, R. P., “Income Tax Statistics,” PuUioatiom 
of the American Statistical Association, June, 1915, pp. 521-549. 

® Not restricted to manufacturing industries in this state. 
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various times in each year, depends largely upon the fluidity of 
labor, the ability of wage earners’ to find employment, and the 
complementary character of industries, studies of which on a 
significant scale have not been made. The fact of unemploy- 
ment is known but it is next to impossible, except in intensive 
studies, to measure it by applying to those affected. The 
United States Census Bureau attempts to measure it from this 
source but the best that is secured is a rough approximation.^ 
Moreover, it is chiefly among unskilled labor that unemploy- 
ment is greatest, and union figures do not furnish the desired 
facts. Wages, therefore, in the sense in which the term is 
used here are not available in any other form than as 
estimates. 

On the other hand, wage-rates for short periods, taken from 
employers^ payrolls for manufacturing and some other Indus'- 
tries, are reported with reasonable accuracy to a few state 
bureaus. In these cases, industries constitute the units, in- 
dividuals and occupations being lost sight of m the grouping 
process. To supplement such data there are the nominal wage- 
rates’ reported by unions in which distinctions are made for 
occupations, industries, sexes, etc. The data are supplemen- 
tary but not comparable. At least no comparisons of rates 
are currently published by bureaus to which both sets of facts 
are reported. 

Earnings, in the sense of income from labor service without 
distinction being drawn between wages and salaries, and in 
contrast to property income, may roughly be approximated 
from the income and expenditure accounts of industrial and 
other businesses.^ 

question on unemployment was first included in tfie population 
schedule by the United States Census in 1880. The information secured, 
however, was never published. In the three succeeding censuses a sim- 
ilar inquiry was included, the form in 1910 being “whether out of work 
on April 15, 1910” and “number of weeks out of work during the year 
1909. ” 

“See the studies of Nearing, op. dt,, pp. 18-52; Streightoff, F. H., 
op, cit , pp. 44, passim. 
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III. A Study op Wages: Declaration of Purpose, 
Definition of Units, Schedule Forms 

Without considering the types and sources of data on sal- 
aries and salary -rates, and without treating prices in relation 
to wages and wage-rates, we pass immediately, in order to 
illustrate the preceding treatment, to a discussion of a wage 
problem upon which it is intended to collect primary data. 
Criticism of the substance, form of tabulation, and interpre- 
tation of existing secondary data must rest with the brief 
sketch given above. The immediate problem, then, is to state 
definitely the purposes of the study which is intended to be 
made, to outline the plan to be followed, to define the units to 
be used, to formulate schedules, and to outline suggestions for 
the receipt and editing of returns. The precise use which 
will be made of the data will, of course, be determined in part 
by the character of the replies and can be only tentatively out- 
lined in advance. It is intended, however, to establish certain 
relations and make certain comparisons between the facts 
reported, and the tabulations will be adjusted to these ends. 

1. DECLARATION OF PURPOSES 

The problem which has been chosen for study is the wage 
conditions in the textile industry in North Carolina for the 
year 1914, For convenience, the survey is restricted to manu- 
facturers of cotton goods, including small wares. On the basis 
of information collected, schedules will be sent to 100 estab- 
lishments which were found to be doing this business at some 
time during the year, the basis for listing establishments, sep- 
arately, being that outlined in the schedules. The purpose of 
the questionnaire is (1) to determine the level of wage-rates 
for the sexes separately by age groups; (2) to measure the 
seasonal fluctuations in employment in relation to (a) princi- 
pal product produced, (b) form of business organization; 
(3) to determine the total amount paid during the year in 
wages to employes of different sex according to an age classi- 
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fication; (4) to study the relation of wage-rates to (a) the 
form of business organization, (b) principal product produced, 
(c) seasonal fluctuations in number employed. 

The schedules are formulated with these purposes in mind, 
and it is intended that they shall be filled in by employers with- 
out supervision other than that which is received from the in- 
structions contained in the schedules. The study is undertaken 
with the assumption that it has sufficient sanction, that the 
filing of the returns is obligatory, that returns for individual 
establishments are not to be published separately, and that the 
results of the study will be of general social interest in which 
informants share equally with others. Sufficient time is to be 
allowed for full reports to be made and tabulations and anal- 
ysis are not to be begun until satisfactory returns are received 
from all establishments concerned. No attempt is to be made 
to supplement the data collected from employers by sched- 
uling either individual employes or unions. Complementary 
material may be secured from these sources but in this study 
it is intended to rely wholly upon returns from employers. 

It must clearly be kept in mind that the discussion imme- 
diately above is illustrative of the steps which would have to 
be taken in the study of such a subject as wages. The facts 
have been given somewhat more in detail than would have 
been necessary had the purpose been merely to describe the 
data on wages and wage conditions in the United States. 
Moreover, it must be remembered that the requirement that 
all of the schedules’ must be returned is rather more severe 
than would be made in actual statistical work. The aim has 
been to set up the procedure which should be followed in an 
actual investigation. Of course, it is not possible entirely to 
do this, but the nearer it can be done, the more interest the 
student will have in his work and the more value he will get 
from it. That which is sometimes’ considered to be meaning- 
less, routine clerical work may, by paralleling as nearly as 
can be a real problem, frequently be thought to be both nec- 
essary and vital. Great value comes from having a student 
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see a problem as a whole and the correlation of the different 
parts. By so doing the meaning of all the statistical steps 
through which he is led takes on new light. He is then not 
so much studying method as a problem to which method is 
vital in its explanation. Most mature minds desire to see 
some goal to their activities and reasons for the methods of 
study which are used. And this is as it should be, for then 
individuality is' bound to reveal itself and the use of statistics 
becomes more than mere routine labor. 

2. SCHEDULE AND EXPLANATION 

The X. Y. Commission op North Carolina 

RALEIGH, NORTH CAROLINA 

It is desired to make a study of the wages and wage conditions for 
the calendar year 1914 in the establishments in North Carolina 
which manufacture cotton goods, includmg small wares. All con- 
cerns in the state domg such business are included in this survey. 
The study is undertaken in accordance with the provisions of law, 
(see Chapter 673, laws 1914) and your cooperation in making it a 
success is respectfully solicited Individual returns will not be pub- 
lished separately, and every care will be taken to hold the facts 
reported confidential. All employers submitting the reports called 
for wiU be furnished gratis with copies of the complete report as 
soon as published. 

Read the whole schedule through before answering the individual 
questions. Accurate answers according to permanent records are 
required on all questions. 

Use the enclosed self-addressed and stamped envelope for return- 
ing the schedule. Schedule should be returned not later than 
April 30, 1915 

The X. Y. Commission, 

Raleigh, North Carolina. 

I hereby affirm that the accompanying report is accuraxe and 
complete to the best of my knowledge, and is made according to the 
permanent records of this establishment. 

Name of Concern Name of l^eretary or other person 

making tibe return 


P 6. Address 


Month 


Year 
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Schedule to be Used in the Collection of Wage Data by Es- 
tablishments IN the Manufacture of Cotton Goods, Includ- 
ing Small Wares, North Carolina, Year 1914. 

1. Name of Establishment 

Use a separate schedule for each establishment. By an establish- 
ment is meant a plant or mill, the accounts of which are kept sepa- 
rately. Where separate plants are owned in common but carried on 
under one set of books, such separate plants are reported together 
as one establishment 

2. Name of Corporation, Firm, or Individual Owner 


3. Location of Factory* 

County City or Town L . . 

Street and No P 0 

The location should be that of the physical plant and not of the 
financially controlling head 

4 Character of Business Organization ( ) ( ) 

Individual Finn Partnership 

( ) 

Corporation 

Indicate whether individual firm, partnership, or corporation by 
checking thus ( V ) the appropriate term. 

5. Frequency of Payment ( ) ( ) . Time- 

Weekly Fortnightly 

or Piece-Rates ( ) ( ) 

Time Piece 

Indicate the frequency of payment, and whether time- or piece- 
rates prevail by checking thus ( V ) the appropriate terms. 

6. Character of Industry 

Indicate by giving principal product manufactured. 

Please be specific respecting the principal product. The data 
are necessary for accurately editing the returns. 

7. Number and sex of Wage Earners, both time- and piece-workers; 

not salaried employes. 

Wage earners are persons receiving money or its equivalent be- 
cause of manual, mechanical, or clerical labor service, paid according 
to a stipulated scale at frequent intervals, and under conditions 
which make it customary to make deductions for short periods of 
time lost. These should be included. 

By salaried employes are meant persons receiving money or the 
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equivalent because of responsible, supervisory, or directive labor 
service, paid according to a stipulated scale at infrequent intervals 
and under conditions where it is not the custom to make deductions 
for short periods lost. These should be omitted. 



A 

B 

C 

Age ajtj) Sex of Employes 

Greatest 
Number 
Employed at 
Any Time 
During the 
Year 

Least 
Number 
Employed at 
Any Time 
During the 
Year 

Total 
Amount 
Paid in 
Wages 
During the 
Year 

Men 18 years of age and over. . 

— 

— 

— 

Women 18 years of age and over 

— 

— 

— 

Young persons under 18 years 
of age 

Boys 




Girls 

— 

— 

— 


8. Number and sex of Wage Earners employed on the 15th of each 
month, 1914 If data are not obtainable for this day enter the 
same for the nearest representative day. 


Data to be of the 15th of the 
Month 

Number of Wage Earners Both Time- and 
Piece-workers Employed on the 15th Day 
OF Each Month 

Adults 18 Years and 
Over 

Young Persons Under 
18 Years 

Males 

Females 

Males 

Females 

January 

— 



— 

— 

February i 

— 

— 

— 



March 

— 

— 

— 



April 

— 

— 

— 

— 

May 

— 

— 

— 

— 

June 

— 

— 

—— 



July 

— 

— . 

' — 



August 

— 

— 

— 

— 

September 

— 

— 

— 

— 

October 

— 





1 

November 







December 

— 

— 

— 

— 






122 STATISTICS AND STATISTICAL METHODS 


9. Classified Weekly Wage-rates for the Week of the Greatest Em- 
plo3T3ient dunng the year 1914. 

Do not include over-time; short-time earnings should be reduced 
to a full-time basis; bonuses and premiums, if any, should be m- 
cluded. Fmes and similar deductions should be excluded. 


Specified Wage-rates Paid for 
THE Week Ending 

Number op Wage Earners Both Time- and 
Piece-workers Receiving Specified Wage- 
rates Per Week 

Adults 18 Tears of Age 
and Over 

; Young Persons Under 
18 Years of Age 

Males 

Females 

Males 

Females 

Under $3 per week 

— 

— 

— 

— 

S3 to $3.99 per week 

— 

— 

— 

— 

$4 to $4.99 per week 

— 

— 

— 

— 

$5 to $5.99 per week 

— 

— 

— 

— 

$6 to $6 99 per week 

— 

— 

— 

— 

$7 to $7.99 per week 

— 

— 

— 

— 

$8 to S8.99 per week 

— 

— 

— 

— 

$9 to $9.99 per week 

— 

— 

— 

— 

$10 to $1099 per week. . . . 

— 

— 

— 

— 

$11 to $11 99 per week 

— 


— 

— 

$12 to $12.99 per week 

— 

— 

— 

— 

$13 to $13.99 per week 

— 

— 

— 

— 

$14 to $14.99 per week 

— 

— 

— 

— 

$15 to $15.99 per week 

— 

— 

— 

— 

$16 to $16.99 per week 

— 

— 

— 

— 

$17 to $17.99 per week 

— 

— 

— 

— 

$18 to $1899 per week 

— 

— 

— 

— 

$19 to $19 99 per week 

— 

— 

— 

— 

$20 to $20.99 per week 

— 

— 

— 

— 

$21 to $21 99 per week 

— 

— 

— 

— 

$22 to $22.99 per week 

— 


— 

— 

$23 to $23 99 per week 

— 

— 

— 

— 

$24 to $24.99 per week 

— 

— 

— 

— 

$25 and over per week 

— 

— 

— 

— , 
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CHAPTER VI 

CLASSIFICATION— TABULAR PRESENTATION 


I. Inteoduction 

Statistical data which are to be tabulated are taken from 
primary or secondary sources or from both. If from priihary 
sources, they are generally recorded on blanks used in per- 
sonal interviews, on form or circular letters, or on question- 
naires. In this form they are not suitable for analysis; they 
must be edited for consistency, accuracy, and completeness 
preparatory to being tabulated, averaged, and compared. If 
they are taken from secondary sources, some form on which 
to assemble them must be devised, provided the plan of ar- 
rangement in which they are found is unsuitable for that 
purpose. The process of orderly arranging data into columns 
and lines capable of being read in two dimensions is called 
^^tabulation.^^ 

Tabulation, however, is an inclusive term. It may be dis- 
cussed from three points of view: (1) the determination of the 
characteristics of data which are to be tabulated; (2) the 
manner in which they are to be classified; and (3) the form in 
which the classification is recorded in tables. 

II. The Chabacteristics of Data to be Tabulated 

To place statistical data in an orderly arrangement presup- 
poses a purpose. When purpose is absent, disorder is found. 
Before data can be orderly arranged, however, their charac- 
teristics must be determined. The questions relating to this 
subject are as follows: 
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1. WHAT AEB THE CHARACTEEISTICS OF ANY BODY OF DATA? 

The characteristics of data are their distinctive qualities or 
properties. Data showing the expenses of operating retail 
stores, for instance, vary according to volume of business done, 
location of the stores, their age, kinds of goods sold, types of 
management, etc. Farms differ in size, types of soil, owner- 
ship, productivity, position, etc.; accidents vary according to 
severity, nature of injury, and frequency. On the basis of 
any or all of these characteristics an orderly arrangement can 
be made of such data. But other questions are immediately 
suggested. 

2. IN WHAT WAY OR WAYS ARE THE CHARACTERISTICS 
RELATED TO EACH OTHER? 

(1) They may be mutually exclusive or inclusive. For in- 
stance, a retailer^s sales of suits of clothes, shoes, and umbrellas 
are mutually exclusive. On the other hand, his total sales are 
inclusive because they are made up of the sales of different 
types of merchandise. The location and fertility of farms, 
the age and sex of clerks, however, are mutually exclusive — 
they have no component parts. 

(2) Some characteristics are primary while others are sec- 
ondary. For instance, the total inventory value of goods on 
hand is secondary; the basis on which the value is taken is 
primary. The first depends on, or is a function of, the second. 

(3) They may stand in the order of cause and effect. High 
wages and high operating expenses; increasing prices and in- 
creasing (dollar) volume of scales; limited production and high 
prices of cereals; large receipts and low prices of hogs at 
Chicago, etc., may be related in this way. 

(4) They may be associated but not causally related as, for 
instance, the amount of credit sales and the volume of business 
done; turnover of goods and profits on sales. 

(5) They may have no apparent relation to each other as, 
for instance, the methods of advertising specialty goods, and 
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the costs incurred; the methods of taking inventories and the 
frequency with which they are taken; the amounts of wages 
paid and the frequency of payment; the stature of a person 
and his earning capacity. 

3. CAN DATA BE EXPRESSED IN SERIES WITH RESPECT 
TO TIME, SPACE, OR CONDITION? 

Price differences, for instance, may be shown by days 
(time), by terminal markets (space), and by amount of vari- 
ation or frequency of occurrence (condition) . 

4. ARE SOME CHARACTERISTICS CUMULATIVE WHILE 
OTHERS ARE NOT? 

Amounts of sales, for instance, may be cumulated over a 
period of years; the customary method of paying salesmen, on 
the other hand, and the number of employes on the payroll of 
Department A in Factory B on a given pay day do not admit 
of such treatment. 

Other peculiarities of the characteristics or properties of 
data will suggest themselves. What they are and the relation- 
ship between them determine the nature of the classification 
which is followed. But what is meant by ''classification’^ and 
what does the process involve? 

III. The Nature of Classification 

Classificatimj as it relates to statistics, is the process of 
arranging data into sequences and groups according to their 
.common characteristics: of separating them into different but 
related parts. Some may be co-ordinate; others subordinate. 
It represents a process of thought — a way of analyzing a prob- 
lem. The nature of the arrangement depends upon the char- 
acteristics themselves, the relations which they bear to each 
other, and the purpose which is to be realized in classifying 
them. 

'Terformed consciously or unconsciously, the act of classification 
is indispensable to and accompanies every scientific inference. A 
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mind is orderly or slovenly, accordmg as it does or does not habitu- 
ally and accurately classify the facts with which it comes in contact. 
The success of an investigation, the worth of a conclusion, are in 
direct proportion to the fidelity to this principle and the exhaustive* 
ness with which the process is carried out.”* 


But what are common characteristics? To be '^common^^ 
they must have the same properties: that is, be alike. But 
“likeness” is relative, not absolute. The cruder the classifica* 
tion, the more alike data seem to be; the finer it is, the greater 
the differences which are found. 

The method of classifying the characteristics of statistical 
data can be shown by the use of examples. Certain data are 
available about retail stores, for instance. How may they be 
classified? The location, sales, expenses, inventories, pur- 
chases, and floor space are mutually exclusive categories. But 
each of these characteristics may be broken up into separate 
parts. For instance, the expenses of operation may be divided 
into the amounts spent for rent, wages and salaries, advertis- 
ing, “busheling” (remodeling), and a number of “miscel- 
laneous” items. The wages and salaries item is made up of 
amounts paid to salesmen and to proprietors, and the part 
paid to salesmen is composed of the amounts paid to those 
giving either full or part time. The compensation of full-time 
salesmen may be salaries or commissions, and the commis- 
sions may be fixed or fluctuating. 

The employes of a factory may be similarly classified. 
They differ according to sex, but each sex group has its own 
characteristics. The males may be German or Swedish, the 
Swedish be native or foreign born, the foreign born be ma- 
chine tenders or common laborers, and the common laborers 
be paid on an hourly or a daily basis. 

Classification of things or the attributes of things' proceeds 
from the general to the specific; from the most inclusive to the 

* Cramer, Frank, The Method of Darwitk: A SUdy in Scientifie 
Method, McClnrg, Chicago, 1896, p. 88, 
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least inclusive characteristics. Co-ordinate classes are grouped 
together, those which are subordinate being made subsidiary. 
For instance, purchases and sales are co-ordinate classes. So, 
also, are purchases of furnishings and of clothing, and pur- 
chases of men^s and boys’ clothing. On the other hand, in- 
ventories’ of men’s suits occupy a subordinate position to 
inventories of men’s clothing. 

Whether characteristics are primary or subordinate, co- 
ordinate or inferior, of course, depends upon the way in which 
they are viewed and the purpose which is in mind in ar- 
ranging them. In all cases, however, the order of thought is 
from the general to the specific. A logical scheme of classifi- 
cation is made in keeping with this general principle. 

In some cases the method to be followed is established — it 
proceeds according to a pattern already worked out. Under 
such conditions, the process is automatic, clerical, routine. On 
the other hand, classifications are made to present, suggest, or 
detect relationships when they are not apparent, and when 
there is no guide which may be followed. Such a classification 
is constructive, not repetitive; creative, not clerical. To dupli- 
cate a classification is easy; to conceive one in order to test 
an hypothesis is dijfficult. It is one thing to classify the char- 
acteristics of data in keeping with instructions; it is another 
to determine the characteristics according to which classifi- 
cation should be made where no pattern is to be followed. 

IV. The Meaning op Tabulation 

To tabulate data is to place them in tables — flat surfaces 
'Vith width not disproportionately small in comparison with 
length” — in keeping with the characteristics which have been 
identified and with the relations between them. The scheme 
involves the use of two dimensions or axes. The units in 
which the measurements are made generally, although not 
always, appear in the ''caption”: that is, in the vertical classes. 
The ways in which the measurements are presented generally, 
although not always, appear in the "stub” — the horizontal 
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classes. A tabulated datum, therefore, is found at the inter- 
section of the vertical and the horizontal axes. It has the 
characteristics shown in the caption and is presented from the 
point of view indicated in the stub. Tabulation follows and is 
distinct from classification: to tabulate is to record data in 
keeping with a classification. 

The tabulation form is made up of a series of ^^boxes,” 
described in the captions and stub headings, into which are 
sorted data having the characteristics discovered through 
classification. The boxes or ^^pigeon holes^^ have fixed posi- 
tions: they cannot be changed nor the sequences in which they 
a^e found altered without recasting the scheme of tabulation. 
To choose a new form, however, is not to discover new nor to 
discard old characteristics. They are simply presented in a 
different way. 

The following statistical facts in the form presented are not 
tabulated — ^they cannot be read in two dimensions: 

^'Employes hired during 1923: men, 536; women, 844. With- 
drew, men during the year, 31; at the close of the year, 37; women, 
during the year, 37, at the end, 68. Men employes at beguming 
of 1924 from those hired during 1923, 458; those who had formerly 
been with the company, 51; new men, 40 Women employes at 
the beginning of 1924, from those hired during 1923, 739; those who 
had formerly been with the company, 19; and those who were 
new, 34.” 

Classification of data for purposes of tabulation, as noted 
above, is either automatic or experimental. 

Where the form of tabulation has been determined and data 
are distributed according to a scheme already provided, the 
process is as follows: 

(1) Begin with the stub. Classify the data first according 
to the most inclusive characteristic, and second, classify suc- 
cessively each subordinate part as there provided. The order 
of procedure, therefore, is from the general to the specific. 

(2) For the most detailed characteristic in the stub, first, 
classify the data according to the most general characteristic 
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named in the caption; and second, classify successively each 
independent part provided In the caption. The order of classi- 
fication in the caption, therefore, proceeds from the general 
to the specific, but in keeping with the requirements as 
established in the stub. 

For tabulations which are not made according to fixed form, 
that is, for tables the purpose of which is to present, suggest, 
or to detect direct or associated relationships between the char- 
acteristics of data, the method is more complicated. 

By a process of reasoning, trial relations between the char- 
acteristics of the data are first established. The data are then 
classified in keeping with these relations and distributed in 
a table according to caption (column) and stub (line) head- 
ings, as in (1) and (2) above. If the results which are secured 
are inconclusive, or of no significance — ^the relations which 
were thought to obtain not having been developed — ^the basis 
of the classification is probably without significance, although 
the tabulation may be correct. If this is so, it is necessary to 
establish other bases of classification and to follow the pro- 
cess of trial and error until the desired end is accomplished 
or proved to be impossible of realization. 

In order to tabulate the data of p. 129, for instance— a 
form of tabulation not having been previously prepared — it is 
necessary to proceed as follows: 

(1) Pick out the co-ordinate classes. These are as follows: 
men and women; years 1923 and 1924; number hired; time of 
withdrawals, etc, 

(2) Place in the caption the classes enumerated, and in 
the stub the bases according to which the classes are to be 
distinguished or the points of view from which they are to be 
presented. 

(3) Record in the body of the table by column and line the 
number of instances fulfilling the conditions named therein. 

(4) Add the different parts of the co-ordinate classes. To 
total the columns, combine the classes in the stub; to total the 
lines, combine those in the caption. 
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The tabulated data would then appear in somewhat the form 
shown in Table 1. 

TABLE 1 

Table Showing bt Sex the Natueb of Changes in an 
Emploted Force in Factory “A,” 1923 and 1924 


Years 

Change in Employbi) Force 

Sex op Employes 

Total 

Men 

Women 


Hired during the year 1923 

1380 

536 

844 


Withdrawals 




m 

During the year 

68 

31 

37 

1923 

At close , 


37 

68 


TOTAL (deduct) 

173 

68 

105 


TOTAL force at end of year 1923 





and beginning of 1924 

1207 

468* 

739 


Hired during the year 1924 





Formerly with the company . . . 


61 

19 

1924 

New employes 

74 


34 


total (add) 

144 

91 

53 


TOTAL at end of year 

1351 

559 

792 


* Incorrectly given as 458. 


Tables depicting the same body of data may take widely 
different forms. Table 1 is used only to illustrate the problem 
under discussion. 

V.^HE Advantages of Tabular Over Non-tabulae 
Arrangement 

Statistical data arranged in tables have definite advantages 
over those descriptively stated. The order in the latter case 
may have no logical basis; it may be according to chance or 
as the items were remembered or jotted down. 
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1. THE ORDER OF ARRANGEMENT OR THE PLAN OP PRESENTATION 

When tabulations are used, some formal order is generally 
followed. Those most commonly used are as follows: 

(1) Arrangement According to the Size or Frequency 
of the Items 

The United States Census Bureau, for instance, tabulates 
in a descending order the amounts of capital, values of product, 
etc., in manufacturing industries. The same method is fol- 
lowed by the Life Insurance Sales Research Bureau in tab- 
ulating by states and by districts the sales of life insurance 
companies. Sometimes, an ascending order is used. In either 
case, the method of presentation is consistent and emphatic. 

When the arrangement is ascending or descending, the posi- 
tions of the items in the series should not be ranked by the 
use of consecutive numbers, as 1st, 2d, 3d, etc. The items 
appear in this order but the frequency or amount of the dif- 
ferences between them is not properly described in this man- 
ner. That this is true, in a typical case, is shown in Table 2. 

TABLE 2 

Table Showing the Names op Industries and Numerical 
Ranking by Value of Product 
(United States Census of Manufactures, 1909) 


Value op Product, 1909 


lirSUBTlUBS 

Amount 

Rank 

of 

Industry j 

Difference 

Amount 

Per 

Cent 

Rank 

Leather, tanned, curried, s 






and finished 

$327,874,187 





Butter, cheese, and con- 






densed milk 

274,557,718 


$53,316,469 

19.42 

1 

Paper and wood pulp . . . 

267,656,964 

20 


2.58 

1 

Automobiles, including 






bodies and parts 

249,202,075 

1 21 

18,454,889 

BRil 

1 

Smelting and refinmg lead 

167,405,650 

30 

81,796,425 

48.86 

9 
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A change in rank of one, in value of product, is shown 
to result from an absolute difference varying from approxi- 
mately seven to fifty-three and one-third million dollars, 
and from a relative difference ranging from 2.58 to 19.42 
per cent. In one instance, a change in rank of one requires 
five-eighths as large an amount as is necessary in another 
case to occasion a change in rank of nine. In cases where 
it is desired to use an ascending or descending order and to 
indicate in a scale the positions of the different amounts, it 
is far better to reduce them to relative numbers, using the 
beginning, the last, or an average of all as a base, than to 
use consecutive numbers. 

(£) Arrangement According to Time 

All data of an historical character must of necessity be pre- 
sented in chronological order. The amounts or frequencies 
may be alike or different. This fact, however, is' ignored when 
the time element controls. Time is continuous and unbroken, 
and its continuity must be preserved. 

(8) Arrangement According to Space 

Suppose it is desired to construct a table showing by states 
the number of tenant farmers. The table might be arranged 
according to the frequency of the occurrence of this phenom- 
enon. In this case, certain of the Southern states would, 
undoubtedly, occupy first place. If contiguous position were 
followed, the states would be listed not according to the fre- 
quency of the phenomenon, but in the order in which they 
occur with relation to each other. If South Carolina were 
listed first, Georgia and North Carolina would follow imme- 
diately. Undoubtedly, such an arrangement would be 
preferable to one in which neither an alphabetical, geo- 
graphical, nor frequency order prevailed. 

In the statistical tables of the United States Census, in- 
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volving geographical distribution, the order of arrangement 
of districts is from east to west — ^New England, Middle At- 
lantic, East North Central, West North Central, South At- 
lantic, East South Central, West South Central, Mountain, 
Pacific. Por the number of ^Tnsane in Hospitals on January 1, 
1910,” this order is numerically roughly descending; for the 
percentage of population born in other divisions of the United 
States, the order is distinctly the reverse; and for the per- 
centage of population under fifteen years of age it is hap- 
hazard.^ 

The relation between the phenomena described and the con- 
trolling fact in presentation — ^passage roughly from east to 
west — in these cases is not clear. It would be evident, how- 
ever, in describing the distribution inland of European immi- 
grants. Undoubtedly, arguments could be advanced for using 
the reverse order in describing the distribution of Asiatics in 
the United States. Railroad time tables invariably observe 
the order of contiguity. Stations are listed not alphabetically 
(except in the index which is not a table) but in the order in 
which they appear on the railroad line. An alphabetical order, 
or one according to size of city, would be of little use to one 
who wished to ^^catch a train.” The point which it is sought 
to emphasize is that, in determining the order of data in 
statistical tables, account should be taken, so far as is possible, 
of the causal relationship or conformity which obtains between 
the facts tabulated and the arrangement of the data used to 
describe them. 

(4) Arrangement According to a Variable Condition 

Wage-rates, income, expense of doing business, prices, inter- 
est rates, etc., are tabulated according to the frequencies with 
which each variation or class of variations occurs. The order 
is determined not by time, nor space, but by amount or degree 
of variation. 

^ “Insane and Feebleminded/* 1910, United States Bureau of the 
(JemuSy Washington, D. C., 1914, p. 18. 
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(5) Arrangement According to Alphabet 

No sacredness inheres in any order of arrangement except 
the alphabetical. But even this has its limitations. The 
industrial accident rate, for instance, is not necessarily highest 
in the ^^A^^ states, nor suicides and divorces lowest in the 
and “W” states. It is hardly to be expected that the order of 
the letters in the alphabet will be of significance as a basis 
for distributing statistical data. And yet, this order of ar- 
rangement is frequently followed where others would be pref- 
erable. Such an arrangement is of merit as a device for identi- 
fication and ready reference, but rarely otherwise. 

The most emphatic parts of a statistical table are its be- 
ginning and its end. Accordingly, an ascending or descending 
order of arrangement is desirable in this respect. Where time, 
space, and frequency relations obtain, however, such an ar- 
rangement cannot be used. Moreover, no particular arrange- 
ment is best suited for all purposes. In tabulating mortality 
rates from tuberculosis, for instance, there would probably be 
an advantage in listing the districts affected according to popu- 
lation density, yet such an arrangement would not be suitable 
for all uses to which the data might be put. Nationality, mode 
of life, and earnings of those affected might be of more signifi- 
cance as a basis for grouping them. In such cases, the best 
order of arrangement will not be one but many. The thing 
that should not obtain is the absence of any causal or related 
order, and this frequently occurs when attention is not given 
to this detail. 

Tables 3, 4, 5, and 6, showing different types of statistical 
data, illustrate varying orders. They should be studied to 
determine what, if any, considerations have controlled the 
arrangement. In Tables 3, 5, and 6, the occasions for using 
the particular orders are clear, at least for most of the classes 
In Table 4 the arrangement is logical, although the basis is not 
so evident. 
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TABLE 3 TABLE 4 


Number op Employes of Railroai>s usr Railway Freight Cars, Number nr 
Service June 30, 1913.* Service, 1913 f 


Class 

Number 

Class of Car 

Number 

dia.n prill nflfir.prR 

4,398 

10,706 

84,267 

37,721 

167,450 

67,026 

etc. 

Box ^ 

1,032,585 

147,541 

78,308 

871,339 

8,216 

43,389 

etc. 

ntlipr nffiriprR 

Fiat 

General ofl5.ce clerks. . . 

ftfflf.imi iicrPTits .... 

Stock 

Coal 

Other station men. . . . 

Enginemen 

etc. 

Tank 

Refrigerator 

etc. 


TABLE 5 TABLE 6 

Developed Water Power Resources, Number op Deaths nr the Dnitbd 
Horse-power, 1900, by Drainage States by Causes, 

BASINS.i 1913 § 


North Atlantic I 

Horse-power 

Causes of Death 

Number 

pf, .Tnlrn Rivor 

13,681 

Typhoid fever 

11,323 

St CroiY B.iver 

20^500 

Malaria 

1 565 

Penobscot River 

70^454 

Smallpox 

125 

TCftHHohoo River 

63^936 

Measles 

8 108 

Androscoggin River . . 

123^455 

Scarlet fever 

5,498 

Presumscot River .... 

20,569 

Whooping cough 

6,332 

Saco River 

25,332 

Diphtheria and croup. . 

11,920 

Merrimac River 

161,333 

Influenza 

7,725 

Connecticut River . . . 

292,899 

Other epidemic diseases 

6,382 

Blaekstone River 

31,435 

Tuberculosis of lungs.. 

80,812 

etc. 

etc. 

etc. 

etc. 


♦ Statistical Abstract of the United States, 1914, p. 267. 
t Ibid,, p. 266. t Ibid,, p. 21. § Ibid,, p. 73. 


2 , TABTJLATED DATA CAN BE MORE EASILY REMEMBERED THAN 
THOSE WHICH ARE NOT TABULATED 

Facts which are possible of association may be more readily 
remembered and compared when logically arranged in a table 
than when descriptively stated. That this is true is keenly 
felt when in order to make a statistical comparison one is re- 
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quired to read page after page of nntabulated figures. The 
same amount of detail can generally be arranged in a table 
occupying only a fraction of the space and carrying much 
more emphasis. Respecting a certain statistical report, one 
critic observes as follows: ^Tn some cases even no attempt is 
made at tabular presentation. Nine-tenths of the expenditure 
underlying statistical work that sees the light in such form 
has been wasted, yet some state commissions publish reams 
of statistics of this nature every year.* * * Thus the seventh 
annual report * ♦ * contains over eighty pages * * * of 
closely printed statistical matter presented almost wholly in 
running text, without tabular arrangement.’’ Moreover, rather 
than being an aid to the understanding of a body of data, it 
is deadening to have the facts contained in a table duplicated 
without analysis or interpretation. It is, moreover, an ex- 
pensive and ineffective method of attempting to emphasize 
that which seems to be important. 

3. VISUALIZATION OF GROUP RELATIONS IS FACILITATED 

To group like with like into a well-arranged statistical table 
permits a rapid survey and a mental picture to be made of 
data in their different relations. When data are not tabulated, 
both are difficult if not impossible. 

4. A TABULAR ARRANGEMENT MAKES IT EASY TO COMPARE 

DATA OP LIKE CHARACTER 

To place related items in juxtaposition simplifies comparison 
and suggests studies which would not otherwise be thought of. 

5. A TABULAR ARRANGEMENT FACILITATES THE SUMMATION 
OF ITEMS AND DETECTION OF ERRORS AND OMISSIONS 

Data may be totaled when they are not in tabular form, but 
at considerable sacrifice of time and effort, because the items 
which are to be added are not placed in lines and columns. 
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Moreover, omissions of classes and items are not easily de- 
tected unless data are tabulated.^ 

6. A TABULAR ARRANGEMENT MAKES IT UNNECESSARY TO 
REPEAT EXPLANATORY PHRASES AND HEADINGS 

The headings of lines and columns describe the items in a 
table. When the tabular form of presentation is not used, it is 
necessary, each time an item appears, to repeat the details 
which identify it. To do this is costly from the printer’s point 
of view and deadening to the reader. 

If it is desirable to tabulate statistical facts rather than to 

r 

express them in running text — that is, to use two rather than 
one dimension — ^then it is also desirable to choose that form of 
tabulation which will best express the ideas which it is in- 
tended that the facts should convey. 

VI. Types of Statistical Tables 

Statistical tables’ are of two general types: (1) general, 
and (2) summary, derivative, or interpretive. 

General tables are detailed, their purpose being to include, 
so far as is possible, all of the facts which are known about 
the phenomena with which they deal. They are inclusive; 
caption and stub headings are involved and complicated, the 
units in which the data are expressed and the way in which 
they are presented serving to give a detailed account of the 
various properties of the data. They contain the basic “raw 
material,” removed one or more steps from the forms upon 
which it is collected, and constitute the source from which 
summary and derivative tables may be made. 

General tables are prepared when analysis is begun, their 
preparation constituting the first step in the process. They 
are sometimes little more than “working papers,” to be dis- 
carded after they have served their purpose. This is almost 


"See Table 1, p. 131. 
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invariably the case when summaries only are needed, and 
when there is no obligation felt to supply details for the pur- 
pose either of informing the public or of providing the means 
whereby summaries may be verified. General tables are costly 
to print and bulky to handle. Moreover, relatively few readers 
are interested in the detail which they contain. They want 
conclusions — “results,’^ as they call them. Accordingly, such 
tables are frequently omitted from publications, separately 
issued, or placed in appendices. 

Government bodies generally and research agencies occa- 
sionally publish such tables. In doing this they make avail- 
able, to others material which may be used in various ways. 
Interest may not lie in the particular summaries used in a 
statistical report; further or different analysis may be desired. 
In the absence of general tables, this is impossible without 
again collecting or assembling the data. 

But so-called “general tables” carry different amounts of 
details. It is often difficult to tell whether a table is general 
or derivative. All tables must of necessity carry some details. 
Those of a summary nature, however, relate not so much to 
individual instances, narrow groups, and classes, as they do to 
totals, averages’, ratios, and the like. Summary, derivative, or 
mterpretive tables are those in which are recorded, not the 
detailed data which have been analyzed, but rather the results 
of analysis. They are brief; that is, they are in the nature 
of a summary. They are drawn from general tables ; that is, 
they are derivative. They contain the results of an analysis; 
that is, they are interpretive. Such tables accompany the dis- 
cussion of a body of data, summarizing the relations which 
have been found to exist among its various characteristics. 

VII. The Tabulation Form 

1. TABLES CLASSIFIED ACCORDING TO THEIR COMPLEXITY 

The form of all tables is a surface, the items being assigned 
to compartments in keeping with their characteristics as de- 
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fined in the descriptive headings in the caption and stub di- 
visions. Simultaneously, they are read both horizontally and 
vertically. The greater the number of* characteristics named 
in either caption or stub, the more complex is the arrangement 
of the details. On the basis of the number of divisions in 
captions and stubs, tables are classified as single, double, 
treble, etc. 

A single table has one characteristic named in the caption 
and one in the stub. For instance, as in Table 7, the things 
named — real estate mortgages in Wisconsin — are placed in the 
caption, and the viewpoint from which they are presented — 
time — is shown in the stub. 

TABLE 7 

Table Showing by Years the Number of Real Estate 
Mortgages in Wisconsin 


Year 

Number op Real Estate 

Mortgages in Wisconsin 

Total 

— 

1922 

— 

1923 

— 

1924 

— 


— 


But real estate mortgages may be classified into two or more 
co-ordinate groups, as those taxable and those non-taxable, 
those on urban and those on rural property, etc. Similarly, 
each year may be divided into two or more co-ordinate parts, 
as January to June, inclusive, and July to December, inclusive. 
Tables are said to be double when either the stub or the cap- 
tion contains two co-ordinate parts. Table 8 is an example 
of a double table, the caption being divided into two 
co-ordinate divisions. 
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TABLE 8 

Table Showing by Years the Number of Real Estate Taxable 
AND NoN-TaXABLE MORTGAGES IN WISCONSIN 


Year 

Number of 

Real Estate Mortgages in 
Wisconsin 


Total 

Taxable 

Non-taxable 

Total 

— 

— 

— 

1922 

— 

— 

— 

1923 

— 

— 

— 

1924 

— 

— 

— 



— — 

..... 

• 

— 

— 

— 

— 

— 

— 

““ 


A double form may be made treble by providing for three 
co-ordinate divisions. The co-ordinate classes in Table 9 
are “taxable’^ and ^^non-taxable’^ and “number’^ and ^^amount.^’ 
The “treble” feature is due to the fact that real estate 
mortgages are distinguished (1) as to number and amount, 
(2) as to taxable or non-taxable, and (3) as to years. 

TABLE 9 

Table Showing by Years the Number and Amount of Real 
Estate Taxable and Non-Taxable Mortgages in Wisconsin 


Number and Amount of Real Estate Mortgages 
IN Wisconsin 


Year 

Total 

Taxable 

Non-taxable 

Number 

Amount 

Number 

Amount 

Number 

Amount 

Total 

— 

— 

— 

— 

— 

— 

1922 

1923 

1924 

— 

— 

— 

— 

— 

— 
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A quadruple form is secured by providing for two co-» 
ordinate classes in the caption, and two in the stub; three in 
the caption and one in the stub; or one in the caption and three 
in the stub. Table 10 shows such a ^^quadruple^’ form. 

TABLE 10 

Table Showing by Years and by Districts op the State the 
Number and Amount of Taxable and Non-Taxable Real 
Estate Mortgages in Wisconsin 



1923 
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It will be noticed that the numbers and amounts of tax- 
able and non-taxable mortgages are given for years and for 
districts. Chronology is controlling respecting time; and 
numerical consecutiveness, respecting space. Totals are pro- 
vided for each year and for all years ; for each district and for 
all districts. The districts are subsidiary to the years in 
tabular arrangement, the former being repeated under each 
year and the total for all years, the reason being that it is 
desired to compare the districts by years rather than the years 
by districts. Had the latter purpose prevailed, the districts 
would have been made primary and the years subordinate in 
rank. The order of arrangement respecting taxability em- 
phasizes the direct relations between number and amount. Had 
the purpose been to emphasize the relation between taxable 
and non-taxable mortgages, the data would have been thrown 
into juxtaposition under the superior headings '^number^^ and 
^'amount.” 

The order of arrangement should always be that which will 
best develop the relations and sequences which are significant. 
As noted below, under Types of Statistical Series and Corre- 
sponding Tables,'^ the order and arrangement of data in tabu- 
lation forms should make it clear that their significance was 
clearly understood when the tables were planned. 

Of course, more complex tables may be constructed. In fact 
there are no limits, except those of expense and statistical 
prudence, to the complexity which tabular forms may take. 

It is generally wise, however, to construct several tables to 
describe complex conditions rather than unduly to burden a 
single form. The amount of detail that may be grasped by 
the eye is limited. Too complicated tables are confusing and 
difficult to interpret. Judgment must be used in this instance 
as in all aspects of statistical studies. 

2. TABLE STEUCTXTEE 

While there are no hard and fast rules relating to table 
"Pp. 157-160. 
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structure, to which appeal can be made for guidance in all 
cases, the following have been found helpful in getting the 
desired results: 

(1) Ruling and Spncing of Major and Minor Headings 

a. The amount of space assigned to major and minor 

headings should be in proportion to their respective 
importance. 

b. Each subsidiary part should be given less prominence 

than its immediate superior. Likewise, the most 
subordinate heading should be assigned more space 
than that given to an individual item in the body of 
a table. 

c. All forms should be set off by double lines at the 

top and at the bottom, the sides remaining open as 
they appear on the printed page. The vertical lines 
in the body emphasize and give distinction to the 
form of the table. Moreover, tables drawn in this 
fashion do not have a box-like appearance. 

d. Major totals should be set off by double lines both 

horizontally and vertically. When a table is complex 
and divisible into two or more distinct parts, the sep- 
arate portions may be set off by double lines. The 
complexity of form and amount of detail in each case 
will suggest the wisdom of modifying these general 
rules. 


{2) The Positions of Totals 

Totals in statistical tables were, until recently, almost 
invariably placed below the detail which they summate. The 
Census Bureau at Washington, some years ago, began con- 
structing tables with totals at the top, and this practice is 
now quite widely followed. There is much to be said in its 
favor. The totals so placed are immediately before the eye 
and are closely associated with the title. Almost invariably 
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they are of chief interest, and it is desirable to have them 
conspicuously placed. With totals occupying this position, 
totaling is upward and toward the left. The sums of totals 
in the lines equal the sums of totals in the columns, the check 
upon the accuracy showing itself in the grand total at the 
extreme left and upper corner of the tabular form. 

(5) Size of Tables and Suitability to the Printed Page 

The size of statistical tables is determined largely by discre- 
tion or necessity. General tables as ^Vorking papers” may be 
of any size desired, the only limitation being the ease with 
which they can be handled and the amount of detail which 
it is thought wise to crowd into them. If such tables are to 
be printed, however, the question of cost is important. Details 
which are thought to be necessary as a basis for thorough 
analysis may be considered too costly to print. Moreover, the 
printed page has its own limitations. It cannot be indefinitely 
extended. If general tables accompany the text analysis as 
appendices, the printed page fixes the limit of size unless folded 
inserts are used. If they are published separately, they^ should 
be kept within reasonable dimensions. Large pages and bulky 
volumes are forbidding to the average reader. 

Summary, derivative, or interpretive tables, on the other 
hand, present no particular problems so far as size is con- 
cerned. They are generally brief and condensed and can be 
printed on pages of moderate dimensions. If they are too 
large for the width of a page, the length may be used without 
serious inconvenience to the reader. If too large for either 
dimension, readjustments of caption and stub headings— even 
splitting up of the table — are always possible. 

From the standpoint of the reader, published tables, so far 
as is possible, should be included on a single page. If they run 
from^ page to page, it is necessary either to repeat in full the 
caption and stub designations, or to adopt some scheme of 
abbreviation or identification which will serve as an 
alternative. 
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(4) The Numbering of Columm and Lines 

To number the columns and lines in general tables makes 
it easy to show the relationship of totals to their component 
parts and to verify the references to them in a text treatment. 
Not infrequently it is necessary m text analyses, when refer- 
ring to items in detailed tables, to employ awkward descriptive 
phrases where it would be easy, by citing line and column 
numbers, unmistakably to fix their position. One often hesi- 
tates to verify references to items because of the time involved 
in identifying them. The costs and inconvenience of number- 
ing both columns and lines are so small, while the value is so 
material, that it seems desirable to adopt both practices in all 
tables in which the amount of detail is large or the form of 
the tabular arrangement at all complex. 

As an alternative to using guide or margin numbers — line 
numbers — some of the United States statistical publications 
arrange lines into groups of five This breaks up the detail 
and relieves the monotony of an elaborate table, thus making 
it easier to follow, but it does not solve the difiiculties in text 
analysis of referring to the details in general tables and of 
showing the columns which are summarized into totals. 
Column numbers, moreover, often help to interpret the rela- 
tions between the items in a detailed table. These are not 
always self-evident even to those experienced in statistical 
study. 

VIII. The Contents of Tables 

The contents of a table, obviously, have to do with the 
purpose which it is intended to serve. If it constitutes a form 
of record only, the data will be detailed ; if it serves as a type 
of analysis, they will be abridged and summarized. Whatever 
the purpose, the contents should be determined in keeping with 
the following rules: 

(1) They should relate solely to the purpose in mind. 

Extraneous materials should not be included: they detract 
from those which are of interest. Moreover, the relations of 
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those which are included to the purpose to be accomplished by 
including them should be evident. Every table should be 
easily understood, and the relation of each part to the whole 
and to the other parts be apparent. 

(2) The items should be accurately distributed and the 
totals correctly summated. 

Totals are but the functions of the items which compose 
them. They are generally no more accurate than the items 
unless errors compensate each other. This condition rarely 
occurs. As to whether it does in a particular case may be de- 
termined only by a study of the units in which the measure- 
mqjits are made; the purpose, plan, and motive governing their 
collection; the interpretation assigned them, etc. — topics de- 
scribed at length above. The discovery of unexplainable 
errors in a table itself raises a presumption against the ac- 
curacy of all of the preceding stages through which data have 
been carried. Moreover, unless its nature is known and can 
be allowed for, it makes doubtful the use of subsequent tables 
into which the error may have been carried. A known error 
can be corrected; one which is unknown is compromising at 
every turn. Totals should be made to cross-check accurately, 
account being taken of the possibility that compensating errors 
may appear in both lines and columns and still the cross- 
check agree. A cross-check is not a complete guaranty that 
inaccuracies do not exist within the body of a table. 

(3) Summary, derivative, or interpretative tables, so far as 
possible, should carry references to (a) the meanings of the 
terms which are employed; (b) the pages from which the 
summaries are taken, and the table, line, and column numbers 
involved; and (c) the scope of the data summarized or 
averaged. 

(4) Statements of the peculiar meaning and limitations of 
statistical tables should closely accompany the tables them- 
selves, be conspicuously placed and clearly stated. 

No one is as well prepared to know the limitations of data, 
at each stage of collection and tabulation, as he who pre- 
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pares' them, and, in justice to all, they should be clearly stated. 
The place for an appraisement to appear is where no one can 
overlook it, 

(5) '^Miscellaneous,” "not stated,” and "unclassified 
items” in statistical tables should be kept at a minimum. 

In case such classes are numerous, it is a wise precaution 
against misunderstanding and a valuable aid in interpretation 
to add an explanatory note showing in a general way their 
contents. Normally, such notes do not immediately accom- 
pany tabular forms, with the result that they are overlooked. 

(6) Tables should be arranged, so far as possible, so that 
items will appear in each compartment named in the caption 
and the stub headings. 

(7) Averages, ratios, etc., should not be made a con- 
spicuous part of general tables. They should be reserved for 
those which are of a summary, derivative, or interpretive 
nature. The two types of tables, of course, are not always 
distinct. In some cases, particularly in brief studies, they 
shade imperceptibly into each other, the same table serving 
both for purposes of record and of summary. In all but the 
briefest studies', however, differentiation can be made and is 
desirable. It is far better to have a complete statement of the 
limitations of the data, adequate definitions of the units and 
reasons for the combinations which are made of them given in 
general tables, than it is to dispense with them and have the 
tables filled with averages and percentages. It is the function 
of the statistician to make statistical data as comprehensive 
and full of meaning as they can be made. It is not his pur- 
pose, in connection with general tables, to analyze them: this 
function is reserved for summary tables. Much time, effort, 
and money are wasted in crowding into general tables an 
elaborate network of percentages, averages, and the like. 

IX. Titles for Statistical Tables 

The title of a statistical table should be a brief epitome 
of its contents. The most important categories should be 
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specifically named but no attempt made to include all of the 
different characteristics. It is not the purpose of a title com- 
pletely to summarize the contents of tables. It should be 
shorty clearly phrased, well punctuated, and impossible of 
double meaning. Titles are generally faulty because of omis- 
sions, improper phrasings, and inverted order. Normally, the 
things enumerated in the title should follow the order of the 
superior and subsidiary headings. For instance, if a table has 
to do with wage-rates, classified on hourly, daily, and weekly 
bases, and these are presented by occupations and by districts, 
or by the nationalities of those occupied, then this order should 
be followed in the title. To invert the order is confusing and 
may be misleading. 

Illustrations of faulty titles, omissions of column headings, 
and other details to be guarded against in tabulations might 
be cited at length but the following will suffice for this pur- 
pose. The reader should always be on the lookout for errors 
and bad form in statistical presentation. In this way he is 
able to improve his own methods and to benefit by the 
mistakes of others. 

In Table 11, co-ordinate classes in the caption are not given 
equal prominence. These classes are “FataP^ and ^^Non- 
Fatal.’^ Accordingly, they should be made to appear of equal 
importance, the detail of non-fatal accidents being reduced 
to a subordinate position. 

In Table 12, there are three co-ordinate classes, but this fact 
is not apparent from the arrangement of the table. More- 
over, “Lacerations or Abrasions’’ are placed as subordinate 
to “Fingers Cut Off,” and “Hand Cut Off” is placed between 
the details of “Fingers Cut Off” and “Total Fingers Cut Off.” 
This arrangement is wrong. Moreover, the total should in- 
dicate the number of “individual” accidents, because, for in- 
stance, the loss of four fingers is called four accidents. 
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Omissions in Column Headings 
TABLE 11 

The Causes op Accidents Resulting in Infection 



The above table should have been constructed thus: 



Misplaced and Confusing Headings and Totals 
TABLE 12 


Jointer Accidents Reported, by Nature of Disability 
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Faulty Rulings and Misplaced Column Headings 
TABLE 13 

Accidents Caused bt Falls op Workmen — Cause and 
Disability 


Causes op 
Accidents 

To- 

tals 

Per cent 

DISTRIBU- 

TIONS 

Fa- 

TAL 

Loss 

OP 

FIN- 

GERS 

In- 

ter- 

nal 

IN- 

JUR- 

IES 

Frac- 

tures 

Sprains 

Lac- 

era- 

tions 

Bruises 

Burns 

In- 

jured 

eyes 

Total — all 
Causes . . . 

1387 


48 

2 

30 

425 

384 

■ 

346 

41 

1 

Falls down 

52 

3.7 

— 

— 

— 

19 

15 

1 5 

13 

— 

— 



— — 


— — 

— 

— 


i 



— — 


The total columns should have appeared thus: 


, Causes op 
Accidents 

Total 

1 Number 

Per cent 

Distribution 

Total 

1387 

100.00 



y 

X. The Mechanics op Tabulation 


Before the actual process of tabulation is begun, it is gen- 
erally necessary to prepare data for tabulation. It is almost 
never possible immediately to transfer them from schedules 
or other primary records onto tabular forms. Data must first 
be edited. Errors must be corrected, omitted items filled in, 
conflicting statements harmonized, and consistency secured. 
This does not mean that the data have to be “cooked.” Not 
at all. They are simply reduced to a comparable basis so that 
they may be combined into groups and classes. 

After data are edited, they are frequently “coded,” indi- 
vidual numbers or letters being assigned each separate group 
and characteristic. By the use of such codes, long descriptions 
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and involved class distinctions are abbreviated, the numbers 
and letters standing in place of the original terms and serving 
to identify them. 

The coded data are then transcribed onto tabulating cards. 
These may be designed for hand or for machine use, but, 
howsoever employed, they have among other the following 
characteristics: 

(1) A given space on a card is always reserved for a 
particular entry. 

(2) Each separate card or a series of them has to do with 
a single report, reporting agency, or condition. 

The cards for hand tabulation may be designed at will. 
Those commonly used are either three inches by five inches, or 
five inches by eight inches, the surfaces being divided into 
as many separate divisions as are necessary to include the 
data to be tabulated. Cards of larger size are sometimes used, 
but the smaller sizes permit of greater accuracy and speed in 
sorting. It is difficult to sort large cards for items appearing 
in the central blocks. The arrangement of the parts may 
follow any order, but that which is most logical should be 
chosen. The logical order is generally the same as that 
followed in the questionnaire, although it may be desirable 
at times, in order to group together related items, to choose 
a different arrangement. 

In a recent study ^ six hand and one machine tabulation 
cards were necessary to record all of the data available. The 
plan of arrangement of the detail on the cards did not follow 
that used in the schedules. The basic facts were placed on 
Card 1, the others carrying less significant detail. The form 
of Card 1 is shown in Figure 2.^ 

CostSf Merchandising Practices, Advertising and Sales in the Retail 
Distribution of Clothing^ Bureau of Business Research, Northwestern 
UniYersity, Prentice>Hall, Inc., New York, 1921. 

®The respective letters and numbers refer to subject, page and inquiry. 
For instance, the third block, Pop-C-2,2 (1), has reference to population 
of city, page 2, inquiry 2 (1). 
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FIGURE 2 

Hand Tabulation Card 


Sch 

Bua-C,l Pop-O 0-2,2 C3) 
2,2(1) 

Sc- I- 

2,2(4) 2,2 C6J 

1 3 8(8) Change Window 8,8 (12) 

1 

1914 

Sales- 4, T. 

Mo Act-6.(2)B 

Fixt.-7,6(7) 

Av.Stock 

7,(9)T 

Sal-11,2T 

Buah-16,5T 

P.B.- 

3,4(2) 

1918 








1919 








1914 

Eetuni-4,CQ 

Purch-6,T 

I)eliv-7,6(8) 

Tot.Exp-lUA 

Adv-13,T 

Tax-16.111 

Bid 

10,1(1) 

1918 








« 

1919 








1914 

Charfire-B,(2)A 

Disc. 6. Q 

Inveii-7.t9)T 

Rent-10, 1(2) 

Gen Exp.l4.T 

Cap.Exp.16, 

IVT 


1918 








1919 









Hand cards may be used to advantage when 

(1) the number of instances to be tabulated is compara- 

tively small. 

(2) the items are large quantities and when it is necessary 

to record exact amounts. 

(3) it is desirable or necessary to compute on the cards 

ratios or averages. 

Tabulation cards suitable for machine use may also be 
employed. The best known are the “Hollerith,” furnished by 
the Tabulating Machine Company. Both machine and hand 
cards are alike in principle — a given position always having 
reference to the same fact, but not the same phase in which 
it IS encountered. The cards are provided in blank — ^the face 
being covered with a series of numbers in lines and columns 

or they are specially prepared to suit a given code system. 
In either case, they are used in essentially the same manner 
as are the hand cards: that is, sorted into groups or classes 
according to the code designations in keeping with a scheme of 
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tabulation. On machine cards, the presence of an item or an 
amount is shown by perforation; on the hand card, by a symbol 
or the fact itself. Machine cards are sorted and totaled by 
electrical contacts; hand cards, by hand methods, 

FIGURE 3 

Machine Tabulation Card 



A facsimile of the machine card used in the study referred 
to above is given in Figure 3.^ Machine tabulation cards may 
be advantageously used when 

(1) the number of instances to be tabulated is large. 

^ The details about the store to which this card refers are as follows : 

This store handles clothing and furnishings only , is not a department 
store , is located in a city with population between 20,000 and 40,000 in 
1920 ; has a trading radius with population of 120,000 to 200,000 ; is on 
a corner , on a street car line , not at a transfer point ; is on an mter- 
urban line hut not at a transfer point; is constructed of brick, and is 
fireproof ; the building is between 25 and 40 years of age ; the total 
linear feet of window display space is between 20 and 30 feet; it has 
vestibule windows without islands; the depth of the windows is 7 feet; 
the depth of the entrance is between 6 and 9 feet ; it has no double-deck 
windows ; uses clothing and hat cabinets and furnishing goods units ; 
has been in business between 10 and 18 years, all of this time in the 
same city, and 6 to 10 years in this building; it has no branches; the 
length of floor space above the basement is between 90 and 100 feet; 
the width of the floor space is between 30 and 40 feet ; the height of the 
lowest floor above the basement is 15 feet ; is located entirely on the 
first floor with between 3000 and 4000 square feet of floor space; 
the basement area is between 1000 and 2000 square feet; has no mez- 
zanine floor or balcony; the total area of floor space (including stock 
rooms) is between 4000 and 6000 square feet. 
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(2) the amounts can be arranged into groups, and only the 

group designation indicated. 

(3) the class items are mutually exclusive, and can be indi- 

cated by symbols. 

(4) all of the data can be placed on one card. 

(5) tabulations of the same type are recurrent. 

Some of the advantages of using the ^^card” system in tabu- 
lation are as follows: (1) Any combination of characteristics 
is easily made; (2) each characteristic or amount is always as- 
signed the same position on the card; (3) the cards are always 
available for tabulation. 

After data have been coded and transcribed onto suitable 
cards, they are then sorted according to the characteristics 
which it is desired to tabulate. The accuracy with which 
punched cards are sorted may be checked by holding the cards 
up to the light and noting whether it passes through the re- 
spective holes for the different items. Any obstruction of the 
light automatically registers an error in sorting. The accuracy 
of the sorting, when done by hand, may be checked by turning 
through the cards and scrutinizing each of them for errors. 
In order that this may be done conveniently, the cards must 
be relatively small and the edges accurately cut. Punched 

(Note 1 continued) 

The store sells boys' and children’s clothing, men’s furnishings, boys’ 
and children’s fuinishings, men’s hats and caps, boys’ and children’s 
hats and caps, work clothing; it does not sell men’s and boys’ shoes, 
men’s fur goods, luggage, women’s wear, nor women’s shoes; its sales 
of work clothing include overalls, union-alls, denims, cotton suits, jackets, 
work shirts, from 2 to 10 per cent of its sales are of palm beach; it 
takes its inventory at depreciated values, does not add freight and 
other charges to inventory value , does not keep a perpetual inventory 
record , does not keep a record of prices nor sizes , uses sales books, a 
cash register, no patented system of books of account, does not keep a 
daily record of profits , prepares a monthly profit and loss statement ; 
has its accounts audited annually by outside accountants; charges to 
personal account all merchandise taken out for personal use; charges 
10 to 20 per cent depreciation on fixtures, pays its buyer, regular- and 
extra-salesmen, bookkeeper, window trimmer, advertising man, and 
bushelmen straight salaries; does not use P.M.’s; sells goods to its 
employes at cost ; had sales during 1919 between $140,000 and $180,000. 
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cards may be employed to advantage even where electrical 
machines for sorting or coimting are not available. Cards are 
sorted first into the more comprehensive groups and subse- 
quently into the sub-groups provided for in the scheme of 
tabulation. 

After the cards have been sorted, the next process is to 
count or add the frequency of the occurrence of each item. 
This may be done in connection with the tabular form when 
direct transcription is made from the schedule or original sheet 
to the table. When large aggregates must be summated before 
tabular entry can be made the process is not easy without 
first listing the facts. The use of adding machines for ^his 
purpose is imperative. It is best to use a listing machine and 
to retain the sheets for future reference. When comparisons 
are to be made, the items on the listing paper may be used in 
computing percentages, averages, etc., for making new com- 
binations, and for cross-checking. 

It is frequently necessary to arrange data into groups and to 
express the occurrence of each item in a frequency table in the 
manner described immediately below. In so doing, the in- 
dividual instance per se is lost sight of. This need is particu- 
larly true respecting data on wages, sales, ages, etc. — cases in 
which it would be difficult, if not impossible, to take account of 
the precise measure of each individual instance. The listing 
or tallying may be done by arranging on the left-hand margin 
of a sheet of paper the groups into which the individual items 
are to be placed, and by tallying off opposite each individual 
group the number of instances occurring. This method has 
the disadvantage of making impossible any check on the ac- 
curacy of the work. An alternative method is to transcribe the 
data to be grouped onto small cards’ and to arrange them 
into groups, thus allowing each group to be checked by rapidly 
running through the cards. This method requires that the data 
be copied, thus allowing error to enter from this source. 
Whichever method is followed, the accuracy of the listing 
should be thoroughly verified. 
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XL Types of Statistical Series and 
Corresponding Tables 

Statistical series^ are of three types: (1) historical, (2) 
spatial, and (3) condition. Corresponding to each of these 
types are the tables in which they are tabulated. Accordingly, 
there are tables showing data with respect (1) to their time 
relations, (2) their space relations, and (3) with respect to 
the frequency of occurrence of things or the attributes of things 
at a given time and space. These different series with their 
corresponding tables require brief consideration. 

The controlling factor in tabulations which express histori- 
cal series is, of course, chronology. Normally, the arrange- 
ment is simple and easily comprehended. All of the facts, no 
• matter how diverse in frequency or divergent in type, are 
tabulated according to time. Only when time is significant, 
however, should chronology dominate the arrangement of sta- 
tistical detail. In cases where it is incidental it should be 
reduced to a subsidiary position. The degree of prominence to 
be given to it depends in each case upon the purpose of the 
table. 

In tabulating space series, the controlling factor in presenta- 
tion is place or location. Variation is seen geographically. 
Chronology has no significance since measurements varying 
in relation to space are taken as of a given time. Table 14 
represents such a series. The data in this table refer to a given 
period of time, and show the methods of wage payment and 
the rates of wages in different municipalties. That is, the table 
presents statistical series viewed geographically, an alpha- 
betical arrangement being followed. 

Of course, a contiguous rather than an alphabetical arrange- 
ment of the cities might have been followed. Such an order 
would be preferable to the one followed if the wage-rates were 
in any way related to the location of the municipalities. More- 

^ A “series/^ as used statistically, may be defined as things or attributes 
of things arranged according to some logical order. 



158 STATISTICS AND STATISTICAL METHODS 

TABLE 14 

Table Showing Union Scales of Wages for Plumbers on Octo- 
ber 1, 1913, BY Municipalities (Labor Bulletin No, 97, 
Mass. Bureau of Statistics, p. 39, Boston, Mass ) 


Municipalities 


Rates op Wages 


Hour 

Day 

Week 

Over- time 
(hour) 

Sundays 

and 

' Holidays 
(hour) 

Attleborough 

$0.40% 

$3.25 

$19.50 

1—1 

00 

o 

$0.81% 

Beverly 

60 

4 80 

26 40 

.90 

1.20 

Boston 

.621/2 

5 00 

27 50 

1.25 

125 

— 

*— 

' — 

— 

— 

— a- 


over, the space units might have been listed according to size, 
but only on condition that there were some relation between 
the details and the size of the cities. Before any arrangement 
is chosen, the relations which it is desired to emphasize should 
be clearly determined. Tabulation is rarely the first step in 
analysis, frequently it is the last step, the early ones having 
been taken in deciding upon the form to be used. A large part 
of the exposition necessary to make plain what is intended to 
be shown can be obviated if a table on its face unmistakably 
reveals its purpose. There is nearly always a best form, and 
it is the peculiar function of the person using statistics to dis- 
cover it. After all, a table is only a form on which are 
recorded relations and sequences. 

Condition series constitute a third type of statistical series, 
the corresponding tables being known as “frequency tables,” 
Variation in size and amount characterizes statistical measure- 
ments of things and their attributes. Uniformity rarely ob- 
tains. The different measurements of natural phenomena are 
distributed about a norm or common measurement when a 
large number of instances are taken, or when sufficient samples 
are chosen purely at random. If, for instance, one were to 
measure the lengths of a number of leaves, chosen at random 
from a particular tree, the different measurements would vary, 
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although a most common or characteristic length would be 
found. From this, other measurements would deviate, some 
being longer and some shorter. If a large number were taken 
and pure chance governed their selection, the number of those 
having lengths greater than the characteristic or common 
measure would tend to be equal to those having lengths shorter 
than the standard as determined. A tendency toward uni- 
formity of distribution in excess and in defect of a common 
measure characterizes all natural phenomena. 

A similar regularity of distribution results from measuring 
the same thing a number of times'. Each measurement is in- 
fluenced by the ^^measuring stick” and by the way in which it 
is used With successive trials, however, the errors due both 
to physical and human causes will tend to be eliminated or 
corrected, and a common or characteristic result be secured. 
With pure chance operating, the deviations or “errors” will 
be distributed in excess and in defect of the “true” measure- 
ment in a systematic and regular order, those in excess tending 
to equal those in defect. 

In the measurements of economic phenomena, a like ten- 
dency for variations to be systematically distributed about 
a norm is observed. Wage-rates vary within narrow margins 
for the same type of labor for a given district, and between 
districts the differences are not large. For a given occupation, 
a norm is established. Wage-rates above and below this 
standard are exceptional both as to the amounts and the num- 
ber of individuals receiving them. The foot frontage value on 
a certain residence city street varies only within a narrow 
margin, the amount of deviation from the extremes being rela- 
tively small and the frequencies relatively few. Down -town 
business blocks have a characteristic height Few will be 
higher than twenty stories, and few less than three stories high. 
Most American freight cars have a capacity of from thirty to 
fifty tons; very few now in use for freight services have a 
capacky of less than fifteen tons, while few are built with a 
capacity beyond one hundred tons The ruling interest rates 
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on real estate mortgages range from 5 to 6 per cent. Some 
loans are made at less than 3 per cent, and a few others at 
more than 10 per cent. The most characteristic rate is prob- 
ably 5 per cent. A norm in such cases tends to be established, 
but it does not obtain in the same rigorous fashion in economic 
as it does in natural phenomena. 

In tabulating such variable phenomena, frequency tables 
are used. Such tables are constructed by listing singly or in 
groups and according to ascending order the units in which 
a phenomenon or condition is measured, and by arranging op- 
posite them the corresponding frequencies with which they 
occur. Tables 15 and 16 will serve as illustrations. ^ 

TABLE 15 

Prequency Table Showing Classified Weekly Wages for Em- 
ployes IN All Manufacturing Industries in Massachu- 
setts, 1912 

{27th Annual Report, Statistics of Manufactures of Massachusetts, 
1912, p. xxii, Boston, Mass.) 


Wage Groups 

Number and Pe 

PLOYES RECEI 

Amc 

Number 

!R Cent op Em- 
viNG Specified 

)UNTS 

Per cent 

Total 

681,383 

100.0 

^ Under $3 per week 

2,266 

0.3 

* $3 but under $4 

5,792 

0.9 

$4 but under $5 

16,909 

2.5 

$5 but under $6 

34,070 

5.0 

$6 but under $7 

52,604 

7.7 

$7 but under $8 

63,879 

9.4 

$8 but under $9 

68,787 

10.1 

$9 but under $10 

75,006 

11.0 

^ $10 but under $12 

103,160 

15.1 

^ $12 but under $15 

107,677 

15 8 

^ $15 but under $20 

104,585 

15.3 

$20 but under $25 

32,536 

4.8 

*$25 and over 

14,112 

2.1 


* Note the changing widths of the groups and the treatment of the residuum. 
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TABLE 16 

Frequency Table Showing the Number op Deaths prom All 

Causes 

Registration Area, United States, 1912 {Mortality Statistics, 1912, 
p. 11, Washington, D. C., 1913) 


Age op Decedent 

Number 

Total 

Male 

Female 

All ages 

838,251 

459,112 

379,139 

^ Under 1 year 

147,455 

82,834 

64,621 

^ l%year 

29,713 

15,748 

13,965 

^2 years 

13,189 

6,889 

6,300 

*3 years 

8,240 

4,392 

3,848 

*4 years 

6,042 

3,178 

2,864 

t Under 5 years 

204,639 

113,041 

91,598 

5- 9 years 

17,274 

9,149 

8,125 

10-14 years 

11,436 

6,008 

5,428 

15-19 years 

20,343 

10,525 

9,818 

20-24 years 

30,997 

16,696 

14,301 

25-29 years 

33,762 

18,495 

15,267 

30-34 years 

33,743 

18,929 

14,814 

35-39 years 

37,916 

21,850 

16,066 

40-44 years. 

37,885 

22,337 

15,548 

45-49 years 

39,624 

23,638 

15,986 

50-54 years 

45,496 

26,995 

18,501 

55-59 years 

45,732 

26,451 

19,281 

60-64 years 

51,097 

28,637 

22,460 

65-69 years 

55,492 

30,045 

25,447 

70-74 years 

55,650 

29,219 

26,431 

75-79 years 

50,772 

25,808 

24,964 

80-84 years 

36,678 

17,689 

18,989 

85-89 years 

19,559 

9,027 

10,532 

90-94 years 

7,082 

2,997 

4,085 

95-99 years 

1,493 

620 

873 

1 100 years and over 

458 

169 

289 

t Unknown 

1,123 

787 

336 


* Note the lower groups, 
t Note the summary of lower groups. 

$ Note the residuum and the ‘‘UnknownL*’ 
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When units of measurement are grouped, accuracy of detail 
may or may not be sacrificed. If a series is discrete any 
grouping serves to disguise the truth; if a series is continuom^ 
it may aid in revealing it. 

'^By are meant those in which measure- 

ments are only approximations, within the limits set up, to an 
absolute but indeterminate value. By discrete or broken 
series, on the other hand, are meant measurements which are 
determined by the nature of the units themselves. In con- 
tinuous series, measurement is dependent upon the accuracy 
with which approximations are made. In discrete series, meas- 
urements are determined simply by the nature of the units. 

The following series of measurements are discrete: the num- 
ber of rows of kernels on ears of corn; the number of pages 
in books; the number of letters in words; prices at which books 
are sold; the wage- and salary-rates paid to employes; the 
number of ^^parts” in automobiles. 

On the other hand, the following series are continuous: the 
weights of bushels of corn, wheat, etc.; the weights of hogs 
received at Chicago on a given day; the square feet of floor 
space used in grocery stores; the ages of workingmen; the 
length of time it takes different men at the same time or 
place, or the same man at different times or places, to put 
threads on a bolt. 

Both time and space units, as such, are always continuous, 
but the measurements of phenomena in time and space may be 
continuous or discrete. The number of books sold per year, 
for instance, may be determined. The facts are discrete. The 
time in which they are sold, known as a “year,’^ however, is 
continuous. Its limits are arbitrarily determined. On the 
other hand, not only may the unit of time but also the measure- 
ment which is’ expressed in time be continuous. Such meas- 
urements as temperatures at hourly intervals constitute an 
example. Heat and cold exist not as absolute but only as 
relative conditions. 

Similar observations also apply to space measurements. 



.CLASSIFICATION— TABULAR PRESENTATION 163 

Space itself is continuous, but the measurements of phenomena 
in space refer to things or their attributes which are continuous 
or discrete. Numbers of employes by departments are dis- 
crete; ages, for the same population, are continuous. Again, 
the number of tractors per farm is discrete; the number of 
acres per farm is continuous. 

The distinction between discrete and continuous measure- 
ments so^fSTas'taBilation is concerned, however, is chiefly of 
interest where neither time nor space, but variation at a time 
or within a space is involved. 

The example of a discrete series in Table 17, showing the 
nujnber of real estate mortgages in Wisconsin in 1904, classi- 
fied by rates of interest, illustrates the relations between fre- 
quencies and units of measurement, and the effect which 
different widths of groups have upon the frequencies. 

A study of the distribution shows that the frequencies in 
groups beginning with the half per cents and extending to but 
not including the even per cents are conspicuously less than 
in those beginning with the even per cents and extending to 
but not including the half per cents. The numbers in the 
former groups show not only a greater concentration on the 
even than on the half per cent imits, but also a greater con- 
centration on the half per cent than on any other fractional 
units. The f requencies ar e^ jetei^ned by the units in which 
interest rates are commonly expressed, and there is no reason 
why an equal distribution throughout the widths of the groups 
should be expected. There is nothing in the nature of the mea- 
surements which requires the units to be continuous and 
infinitesimally small. 

As the groups stand in column (a), the piling up of the 
frequencies on the lower side is evident in every case. If they 
are widened, as in column (b) , the distribution is still of the 
same general character; but the relative degree of concentra- 
tion on the half per cent and other fractional parts cannot be 
determined. Column (b) is distinctly less suggestive for the 
separate groups, but much more so for the complete range than 
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is column (a). In the distribution in column (c) — one per 
cent groups, as 3^^ but less than 4% per cent, etc. — ^the even 
per cents appear in the middle of the groups, the emphasis 
assigned to them being theoretically distributed over the whole 
group. This theoretical dispersion does not, however, fit the 
case; the concentration is still on the even per cents, and any 
attempt to distribute it evenly over the whole group conflicts 
with the facts as shown in column (a). For purposes of anal- 
ysis, it is often desirable to place the limits of the groups 
as in column (c), but it is always necessary to remember the 
actual as distinct from the theoretical distribution. 

TABLE 17 


Frequency Table Showing the Number of Beal Estate Mort- 
gages IN Wisconsin, 1904, Classified by Bates of Interest 
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TABLE 18 

Freqtjenct Table Showing Distribution oe the Lengths op 
Lobsters * 


Lengths in 
Inches 


(Frequency) 

(a) 


% Inch 
Group 

(Fiequency) 
(&) 


% Inch 
Group 

(Frequency) 
( 0 ) 


1 Inch 
Group 

(Frequency) 

(d) 


1 Inch 
Group 

(Frequency) 

(e) 


8 

8 % 

8 % 

8% 

9 

9 % 

9 ^ 

9 % 

10 

10 % 

10 % 

10 % 

11 

11 % 

11 % 

11 % 

12 

12 % 

12 % 

12 % 

13 

13 % 

13 % 

13 % 

14 

14 % 

14 % 

14 % 

15 
15 % 
16 % 
15 % 

16 
16 % 
16 % 
16 % 

17 
17 % 
17 % 
17 % 

18 

18 % 

18 % 

18 % 

19 

20 


6 

2 

3 

3 

143 

35 

241 

55 

614 

61 

632 

45 
568 

43 
307 
11 
414 
8 , 
166 
12 , 
321 
6 , 
146 
2 . 
426 

90 

280 ' 
1 . 

46 
3 , 

103 
1 , 
13 

30 


I 


6 

178 

296 

675 

677 

611 

818 

422 

168 

326 

148 

426 

90 

281 

48 

104 

13 

30 

3 

7 


11 

■181 

810 

688 

918 

433 

489 

153 

516 

281 

161 

14 

33 

7 

4 


14 

474 

1162 

929 

590 

474 

616 

329 

117 

33 

7 

4 


6 

151 

845 

1206 

775 

497 

679 

870 

162 

44 

10 


*The measurements in column (a) are taken from the American 
maUsUcal Association PuUications, Vol. 7, p. 60. The original data 
are in a monograph by Dr. Francis H. Herrick on “The American Lob- 
ster in the United States,” Fish Commission Bulletin for 1895, 
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In contrast with series such as that given in Table 17 which 
is discrete both as to the unit (interest rate) and the mea- 
surement (the number) are those which are continuous in 
one or in both respects. In Table 18, showing the number 
(the measurement) of lobsters of different lengths (the unit 
being length to the nearest quarter of an inch), the unit is 
continuous and the measurement discrete. In classifying these 
Crustacea, the measurements are first distinguished by quarter 
inch differences. When this is done, the frequencies’ as in 
column (a) are unevenly distributed for lengths approximately 
equal. This is contrary to common sense. There is nothing in 
the nature of the case which will explain the large differences 
in the numbers occurring at the units of length indicated. A 
study of the tables shows that the frequencies are concentrated 
on the even and the one half inches. No such concentration, 
however, actually occurs. The reason for the concentration is 
the wish of the one who did the measuring. Arbitrary units of 
length — a continuous fact — ^were set up, and then the numbers 
(a discrete fact) falling at approximately these lengths were 
identified. 

The frequencies in column (a), although they appear to be 
precise and accurate, are in fact inaccurate. Neither in the 
world at large nor in the sample selected for measurement 
does such a condition as there indicated obtain. Indeed, 
greater accuracy from group to group and over the entire 
range of measurements is secured by expressing the frequencies 
in wider groups. This is done in columns (b), (c), (d), and 
(e). It is more correct to say, for instance, that 1152 cases 
were encountered measuring 10 to 11 inches m length than 
to say that 514 were 10; 61, 10%; 532, 10%; and 45, 10% 
inches. The thing which distinguishes this distribution from 
that of the mortgage interest rates is the unreal concentration 
upon even and half inch units. In the former case, concen- 
tration actually exists and should be preserved; in the latter 
case, it is fictitious and should be smoothed out by widening 
the groups. This process in the former case sacrifices ac- 
curacy ; in the latter, it helps to realize it. 
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In fixing the number, the widths, and the origin and termi- 
nation of groups representing continuous series, the aim should 
be to (1) leave no group unrepresented by frequencies, (2) 
provide for a gradual distribution of the instances through 
the groups, (3) permit the frequencies gradually to reach 
a maximum and “tail off” to a minimum, and (4) have 
the widths’ exceed the differences observed in the measure- 
ments. 

In frequency distributions, both of discrete and of con- 
tinuous series, it is desirable to make the groups of equal 
width. If this rule cannot be followed because the use of equal 
siz^d groups (1) is too detailed for some and not detailed 
enough for other frequencies, (2) results in securing a distri- 
bution not properly descriptive of the frequencies over their 
entire range, (3) would leave some groups vacant, etc., then 
the larger groups should be multiples of the smaller ones. 
While the larger ones cannot be broken up, the smaller ones 
can be combined when comparisons are desired. 

Table 19, showing the distribution of wage-rates of operators 
in woolen and worsted mills in the United States, illustrates 
the use of unequal groups and suggests the errors into which 
one may be led through their use. 

By ignoring the widths of the groups and assuming them 
as equidistant — a likely thing to do unless one is accustomed 
to studying such data — it appears that the regular descending 
order of the frequencies for both males and the total is 
abruptly broken at the frequency 2604 for the total, and at 
2109 for the males, thus giving a new point of concentration 
of the wage earners. The larger numbers of frequencies, of 
course, are due to the use at this point of wider groups. This 
table can only rightly be interpreted if full account is taken 
of the fact that the distribution applies to groups with limits 
of 2, 5, 6, 10, and 15 cents, as well as to one group which is 
open at the upper side. If the table had been properly con- 
structed, the order of the units — hourly rates of wages — ^would 
have been inverted, and uniform size groups, or groups which 
are reducible to multiples of each other, used. When different 
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sized groups are used, breaks should be made in the body of 
the table to call attention to the fact. 

In writing the limits of groups, a smaller fraction of the 
whole unit should not be used than was employed in the actual 
process of measurement. For instance, wages measured in 
cents should not be expressed in groups reading in fractional 
parts of a cent. Likewise, if measurements are made to the 
nearest half inch, the limits of the groups should not be indi- 
cated by quarter inches. Moreover, it is desirable, in order 
to guard against confusing the upper limits of a lower group 
with the lower limits of an upper group, to avoid writing the 
two in the same form. For instance, the group ^^30 to 40” 
should be written “30 but less than 40.” In this form, it is 
clear that a frequency of 40 belongs in the group 40 but less 
than 50. 

TABLE 19 

Frequency Table Showing the Number of the Operatives in 
Woolen and Worsted Mills in the United States, by Sex 
AND BY Hourly Rates of Wages 


{Report of the Tariff Board on Schedule K, Vol. IV, part 5. House 
Document No 342, 62d Congress, 2d session, p. 997) 


Hourly Rates of Wages 

Total 

Males 

Females 

Total 

30,454 

17,343 

13,111 

75 cents and over 

33 

33 

— 

60 to 74.99 cents 

60 

59 

1 

45 to 59 99 cents 

109 

106 

3 

35 to 44.99 cents 

291 

287 

4 

30 to 34.99 cents 

486 

451 

17 

25 to 29.99 cents. 

2,004 

1,849 

155 

20 to 24.99 cents 

2,604 

2,109 

495 

18 to 19.99 cents 

1,682 

' 1,142 

540 

16 to 17 99 cents 

2,635 

2,036 

599 

14 to 15.99 cents 

4,926 

3,729 

1,197 

12 to 13 99 cents 

6,007 

3,186 

2,821 

10 to 11 99 cents 

6,153 

1,453 

4,700 

8 to 9 99 cents 

2,722 

757 

1,965 

6 to 7.99 cents 

661 

133 

528 

Less than 6 cents 

99 

13 

1 86 
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Table 20 illustrates a flagrant violation of these principles. 
The upper boundaries of the second and ninth groups are in- 
definite. According to the way in which they are stated, items 
of 3 and 21 per cent, respectively, are not to be included, yet 
it is certain from the succeeding groups that they are in- 

TABLE 20 

Table Showing the Percentage Relation of the Assessment 
OF Personal Property to Total Assessment 

(Report of the Joint Legislative Committee of the State of New York, 
Albany, 1916, p. 260) 


<» 

Relation of Personal Property Assessment 
TO Total Assessment 

Number 

Width of Groups 
IN Per Cents 

Total 

53 


Less than one per cent 


Less than one 

From one to three per cent 


3* 

From four to six per cent 


2t 

From SIX to eight per cent 

10 

2t 

From eight to eleven per cent 


3 + 

From eleven to thirteen per cent 


2 + 

From thirteen to eighteen per cent 


5 + 

From eighteen to twenty per cent 


2 + 

From twenty to twenty-one per cent. . 


2* 

Greater than twenty-one per cent 


Indeterminate 


* Upper limit included, 
t Upper limit not included. 


eluded. If they are, the order is an exception to that which 
characterizes the majority of the groups. As a result, 
one IS left in doubt as to what is intended. Moreover, the 
groups are so different in size that discredit is thrown upon the 
whole table. 

XII. Conclusion 

A detailed summary of this chapter seems unnecessary. The 
aim has been to consider only the most important aspects of 
the subject. The more general phases of classification and 
their bearing upon scientific method have for the most part 
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been taken for granted.^ They need no extended considera- 
tion in this connection. We have striven only to show the 
application of classification to statistical facts. 

The technique of tabulation has been approached with the 
problem of the statistician in view, the aim being to call at- 
tention to and to warn against certain indefensible practices 
commonly followed and at the same time to formulate, as 
nearly as can be done, rules of general application. Attention 
is drawn to the characteristic differences in statistical data 
and to the appropriate methods of showing them in tables. A 
logical background for the existence of tables, and the re- 
ciprocal relation of the point of view from which data are con- 
sidered and the way in which they are presented in tables have 
been emphasized. Tabulation is always more than a mechani- 
cal drawing of lines and inserting of numerical symbols. To 
its purpose and technique, too much attention cannot be given. 
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CHAPTER VII 


DIAGRAMMATIC PRESENTATION 
1. Introduction 

Amounts and frequencies are tabulated in Arabic or Roman 
numerals; they are illustrated by lines, bars, surfaces, volumes, 
and maps. The facts themselves may be either discrete or 
continuous, and be related to different times, different places, 
or to different conditions at the same time or place. The 
various devices used to illustiiate discrete data are treated in 
this chapter under the heading Diagrammatic Presentation. 
Those used to illustrate continuous series are discussed in the 
following chapter. Graphic Representation. 

In the chapter on Classification — Tabular Presentation the 
function of a logical classification of statistical data and of 
their arrangement in tables was discussed at length. It was 
learned that primary data must be classified and reduced to 
order from the heterogeneous form in which they are reported, 
while secondary data must be rearranged, separated, combined, 
and worked over to suit the purposes for which they are in- 
tended. Respecting both, the first essential to tabulation is 
classification. The classes into which data fall are arranged 
logically in the order of their importance, the data themselves 
being placed in the lines and columns of tables. Such an ar- 
rangement facilitates study, throws related things together, 
and suggests analysis. Our purpose in this chapter is ,to con- 
trast tabulation with diagrammatic presentation, and to 
discuss the value of the various forms of illustration currently 
used for this purpose. 

The purpose of tabulation is to reduce masses of facts to 
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logical order according to the units of measurement in which 
they are expressed and for the purposes desired. The func- 
tions of diagrams are to illustrate these facts according to the 
order worked out by tabulation. Tabulation is a condition of 
analysis; diagrams are generally illustrations of conclusions 
from analysis. The former is necessary in interpretation; the 
latter are useful in explanation and exposition. Classification 
and tabulation precede; the use of diagrams follows. The 
former clarify the meaning of data; the latter frequently ob- 
scures it. Diagrams can never displace tabulation; they may 
conveniently accompany it if used with discretion. Tabula- 
tion alone suggests study and analysis; diagrams alone ^ are 
more likely to serve as bases for conclusions arrived at with- 
out study, and to foster a disregard for the details from which 
diagrams are drawn. Careful analysis of tabulated data is 
frequently necessary before their full meaning is divulged; a 
superficial view of diagrams is often gathered from mere 
inspection. 

Diagrams rarely add new meaning to facts which they illus- 
trate. What they do do is to add to the meaning by throwing 
it into relief and by clarifying it. 

It is unwise, as a general rule, to use analogies, but one 
may be hazarded in order to show the dependence and sec- 
ondary character of diagrams in statistical studies. Botanists, 
in classifying plants, use established points of distinction to 
separate them into groups. The common characteristics are 
noted in detail and become the bases for further classification, 
each sample or group of samples being differentiated from the 
others by the presence or the absence of chosen criteria. 
Groups and sub-groups are distinguished and these again are 
studied in the light of the distinguishing marks chosen. This 
process’ is continued until the points of difference are ex- 
hausted, or xmtil some scheme of organization extending 
throughout the whole group or groups is discovered. The meth- 
ods of classifying plants are analogous to those of classifying 
statistical data. The common characteristics become the cri- 
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teria of distinction. Labeling, naming, and mounting botani- 
cal specimens are processes analogous to illustrating and 
^^mounting,^^ by statistical diagrams, the relations established 
through tabulation. The former may exist and be independent 
of the latter in both instances; the latter grow out of and are 
conditioned by the former in all cases. 

What has been said is not meant to detract from the value 
of diagrams as aids in statistical studies. Its purpose has been 
solely to show that they are subordinate to classification and 
tabulation. Diagrammatic illustrations of data can never re- 
place the data themselves, no matter how accurately they tell 
the truth nor how skillfully they are drawn. They are at 
best statistical aids and should be so considered by those who 
use them. A well-drawn and cleverly constructed diagram is 
never a guaranty of the value of the statistical facts which it 
illustrates. 

This contention is supported by a review of the Statistical 
Atlas of the United States. The reviewer, in questioning the 
need of such a volume, raises the point whether it is desirable 
to segregate the illustrations from the tables and text analysis. 
He says: 

''Is the policy of segregation a wise one? Presumably these maps 
and diagrams have had and will contmue to have their most effective 
use in connection with the tables and text with which they were 
originally published. To place them in a separate volume with 
the barest textual comment seems unduly to burden the graphic 
method of presenting facts. Frequently charts and maps greatly 
strengthen the textual exposition of a subject; they seldom serve 
as a complete substitute for editorial analysis.^^^ 

The psychology of the use of statistical diagrams is worthy 
of brief consideration. It is difficult to hold in mind a great 
mass of figures. Relations are likely to be obscured in the 
effort to remember the amounts themselves. Well-constructed 
tables, however, partly compensate for this limitation. But 

rrrl "Statistical Atlas of the United States/’ in 

rhe American Economic Review^ September, 1915, pp. 648-650, at p. 650. 
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even when facts are arranged in tabular form, the size of the 
items, in all but summary tables, is given chief emphasis. But 
size is seen in its absolute rather than in its relative aspects. 
Degrees of difference between items at the same time, the 
same place, and for different times and places are not easily 
comprehended when data are expressed in quantities. The 
order in which they are arranged may in part compensate for 
the limitations of tabulation, but it cannot entirely overcome 
them. If, for instance, an order of arrangement is according 
to magnitude or frequency, as when districts are arranged in 
the order of amount or number of sales; or if it is consecutive, 
as when loans are listed according to size of interest rates, ^an 
idea of extreme change is readily grasped. The distribution, 
amount, and frequency of change, however, are emphasized 
when they are thrown into relief by some form of diagram- 
matic illustration. On the other hand, when no definite order 
in tabulation is followed, or when the order of arrangement is 
illogical — or, if logical, is not consistently followed — differences 
in time, space, and frequency do not stand out.^ It is to over- 
come these imperfections and limitations of tabular arrange- 
ment, to introduce devices for showing the proportional 
relations between facts, and to emphasize the relations of 
amounts to space, that diagrams of various types are used. 

The power of visualization is only partly realized in tabula- 
tion. True, if tabular forms are properly drawn, data are 
arranged in lines and columns according to a logical plan. 
But relations do not stand out. They may be worked out by 
means of percentages and ratios, but such expressions are dif- 
ficult to visualize. Absolute and relative differences in in- 
terest rates on real estate mortgage loans in Illinois, for in- 
stance, may be compared with the frequencies with which the 
various rates occur, but it is not easy to relate the rates 
geographically to the coimties of the state without using a 

^ The desirability of having every tabular form determined according 
to a definite plan and follow a logical order is developed in the preceding 
chapter, pp. 132-lSn 
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statistical map. A tabular form in which the counties are 
arranged alphabetically may have no logical significance. To 
group the counties by rates may not necessarily be to include 
contiguous territory. Where space relations are involved, sta- 
tistical diagrams help to make them clear. Even where geo- 
graphical distribution is not important, they help to show re- 
lations, proportions, and sequences. 

Probably sufficient has been said to indicate in a general 
way that diagrammatic illustration adds something to tabula- 
tion. Just how this is done and in what way by different 
types of diagrams will be made clearer by a discussion of the 
different forms used, the technique of their construction, and 
the psychological basis upon which each rests. 

II. Diagrams for Illustrating Frequency or 
Magnitude Alone 

1. ALTERNATIVE TYPES — GOOD AND BAD FEATURES OF EACH 

The diagrammatic forms commonly used to illustrate 
amounts and frequencies which are discrete are lines, bars, 
surfaces, and volumes. As a class, these are called pictograms. 

Suppose certain data were available concerning the stocks 
of merchandise of a retail store. The amounts on hand at 
dates of inventory for a succession of years constitute a dis- 

TABLE 21 

Stocks op Merchandise Illustrating Different Types of Sta- 
tistical Series 


(Time Series) (Space Senes) (Condition Series) 


Ybabs 

Amounts on 
Hand at Date 
of Inventory, 
Jan. 31 

Depart- 

ments 

Amounts on 

Hand at Date 

OP Inventory, 
Jan. 31, 1924 

Methods op 
Taking 
Inventory 

Amounts on 
Hand at Date 
OP Inventory, 
Jan 31, 1924 

Average 

$210,000 

Total 

$180,000 

Total 

$180,000 

1921 

200,000 

A 

60,000 

At Cost 

30,000 

1922 

240,000 

B 

40,000 

At “Market’* 

110,000 

1923 

220,000 

C 

80,000 

At Appreciated Value 

5,000 

1924 

180,000 

D 

50,000 

At Depreciated Value 

35,000 
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Crete time series; the amounts of stock classified by depart- 
ments, a discrete place series; and amounts classified by the 
methods of taking the inventories, a discrete condition series. 
The data are in Table 21. 

To illustrate each of these series, various forms of diagrams 
may be used, the parts standing for and being proportional to 
the amounts. If lines are used for the time series, for in- 
stance, the amounts' may be shown as in Figure 4, a 


FIGUEE 4 


1921. 


1922. 


1924. 


horizontal arrangement being used and the lines having 
no common base. On the other hand, the points of origin 
may be made the same in all cases, the lines extending either 
horizontally or vertically. The diagrams, respectively, would 
then appear as in Figure 5. In place of lines, bars of equal 
width — Abroad lines, in fact — ^may be drawn vertically or hori- 


FIGURE 5 




Amt's in Thousands Years 




192 ] 
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FIGURE 9 


1921 1922 1923 1924 


illustrations, if horizontally placed, would appear somewhat 
as follows: 


FIGURE 10 


1 

1 

7 

1 

1 

/ 

/ 

7 


1921 


/Sy 

7 

j 

1 

J 


/ 

7 


1922 



7 

1 

1 

1 



7 


1923 



1924 


The facts shown in Table 21 are discrete and separate. 
Neither the times, the places, nor the conditions are dependent 
upon each other. While the amounts by years, by departments 
and by methods of taking inventory constitute series, they are 
unrelated to each other. They are separate identities. More- 
over, because of the fact that relative size alone is illustrated, 
the lines, bars, surfaces, and volumes may have any dimensions 
desired, the only condition necessary to their faithfully illus- 
trating the facts in question being that proportionally they 
bear the same relation to each other. 

The same types of diagrams may also be used to illustrate 
the component parts of a total. For instance, if it were de- 
sired to make a diagram of the components of the total inven- 
tory on hand January 31, 1924 — distinction being made by 
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departments — lines, bars, surfaces, or volumes could be used. 
The line type would appear broken as in Figure 11. If the 

FIGURE 11 
Departments 

< ± . 2 


mL 

5 


■ i I — I 

10 15 20 


bar type were used, it would appear in the form shown in 
Figure 12. The length of the bar is equal to the total inven- 


FIGURE 12 
Departments 


A 

B 

c 

D 


i 1 . 

tory, and the lengths of the parts, to the amounts found in the 
different departments. The portion in Department A may be 
directly compared with the total because both have common 
points of origin. Those in Departments B, C, and D cannot be 
easily compared with each other or with the total because they 
do not have a common base. In this' respect they are similar 
to the lines and bars, placed horizontally, which illustrate the 
inventories on hand in the different years. 

If bars are used to show component parts at two different 
times or places, or under two conditions, then they will appear 
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in the following form — Figure 13 — different years being used 
for illustration. In this case, the respective lengths of the bars 
and of their component parts illustrate actual amounts. Com- 
ponents and the totals in both years may be directly 

FIGURE 13 


Departments 



0 6 10 15 20 25 


compared with each other because they have a common base, 
and one dimension — horizontal — only is used. If the same 
facts were shown on a relative scale, the diagram would appear 
in the form shown in Figure 14. That is, the total inventory 
values at the two periods, while quantitatively different, are 

FIGURE 14 


Relative Scale 



0 10 20 30 40 60 60 70 80 90 100 


Per Cent 
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treated as equal in distributing on a proportional basis the 
amounts in the different departments. 

Areas are sometimes used to show component parts, but 
their use is uot recoi 7 i 7 ti 67 hded. Suppose it were desired to show 
by departments the components of the inventories and areas 
were used. The figure would appear somewhat as Figure 15, 
the total area equalling the complete inventory and the small 
areas the respective parts. None of the sections are directly 
comparable with each other or with the total — ^there is no 
common base. Moreover, since areas are used, the quantities 
are equal to the products of the sides of their respective rec- 
tangles, and cannot be readily compared. 

FIGURE 15 


B 

D 



A 

C 


If surfaces or areas are used to show component parts at 
two different times or places, or under two conditions, then, 
using the illustration shown by bars, the figures would appear 
as in Figure 16. In such figures, the dimensions of the total 
areas, as’ well as of those of the parts, vary as the square roots 
of the surfaces. Comparisons in such cases are extremely dif- 
ficult if not impossible. Figures of this type should not be 
used. 

Circles or pie diagrams are also used to show component re- 
lations. For this purpose, they are not recommended. If the 
component parts of the total inventories at a given time, dis- 
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FIGURE 16 


B 

C 



A 

D 

J 


B 

C 



A 

D 


tributed according to departments, are illustrated in this way, 
the resulting figures would be as shown in Figure 17. 

The total area represents the total inventory: the areas of 
the parts, the amounts in the respective departments. Areas 
are used in all cases. But the area of a circle is secured by 
squaring the radius and multiplying by jr — 3.1416. When it 
is divided into components, the parts appear to stand in the 
relation of their respective chords. But this is not the case, 
since the smaller the sector, the longer the chord relative to 
its corresponding arc, and vice versa. The areas of the sectors 
are proportional to their respective arcs, but not to their re- 
spectiye chords. But it is the arcs which cannot be easily 

FIGURE 17 
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m 

fiompared“they are circular — and relative lengths are not ap- 
parent* To be compared, they must be straightened out in 
the mind. The ease with which this can be done varies in- 
versely with their length. 

All radii of a circle, of course, are equal and the lengths oi 
the arcs are proportional to the angles at the center. But it 
is as difficult to compare the relative sizes of the angles as it 
is the lengths of the arcs. 

The types of figures which one is asked to compare, when 
sectors of circles are used to show component parts, are illus- 
trated in Figure 18, In the part marked ^^A” the chords are 
placed in a straight line. It is apparent from the illustration 
that the areas of the sectors have little relation to the chord 
lengths, and yet it is these which attract the eye in the pie 
diagram. In the part marked tangents, in the form of a 
continuous straight line, are drawn to the respective sectors 
at points a', 6', c', d', the sectors having been separated. The 
areas of these figures cannot be readily compared — ^they are 
not graphic. The lower part of Figure 18 shows the respective 

FIGURE 18 



37.7 


31.4 
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lengths of the arcs and of the chords with the differences be» 
tween them. The larger the angle, the greater the difference 
between the chord and the arc, and vice versa. 

A pie diagram is a clumsy and defective method of illus- 
trating component parts; a bar of uniform width — ^that is, a 
one-dimensional figure — ^is much more satisfactory. 

The use of circles or pie diagrams to show component 
parts of things at different times, different places or under dif- 
ferent conditions, is even less defensible. Such an illustra- 
tion as Figure 19 is sometimes used for this purpose. 

FIGURE 19 




It is necessary, in case actual amounts are used, to compare 
(1) the sizes of two circles, (2J the proportions of each taken 
up by the different parts, and (3) the comparative sizes of the 
parts in one with the corresponding parts in the other. This 
is asking too much ; it cannot be done. For the eye to compare 
the areas of the different parts in the same circle is difficult 
enough; but to compare the relative areas of corresponding 
parts in two circles whose total areas vary as the squares of 
their radii is impossible. Concerning the disadvantages of the 
pie chart, a recent writer says: 
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'It is worthless for study and research purposes. In the first 
place, the human eye cannot easily compare as to length the various 
arcs about the circle, lying as they do in different directions. In 
the second place, the human eye is not naturally skilled at com- 
paring angles — those angles at the center of the circle, formed by 
the various rays or radii and subtending the various arcs. In the 
third place, the human eye is not an expert judge of comparative 
sizes of areas, especially those as irregular as the segments of parts 
of the circle There is no way by which the parts of this round 
unit can be compared so accurately and quickly as the parts of a 
straight line or bar ” ^ 

Amounts, frequencies, and component parts cannot be 
readily illustrated by cubical figures, the contents of which 
vary as the cubes of their dimensions. Two quantities such as 
729 and 19,683, for instance, are illustrated by the use of 
bars — one dimension being used — in Figure 20. Cubes show- 
ing the same facts are given in Figure 21. That is, the respec- 
tive dimensions stand in the relation of 9 to 27, or 1 to 3, and 
the contents as 729 to 19,683, or 1 to 27. It is not easy to 
think in terms of three dimensions; by the casual reader, 
volumes are read in one dimension. 

Component parts are even more difficult to show by the use 
of volumes. In order to determine the dimensions to be used, 

FIGURE 20 



^ Karsten, Karl G., Charts and Graphs, Prentice-ETalL New York, 
1923, p. 91. 
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it is necessary to take the proportionate parts of the respec- 
tive contents, and to extract their cube roots The resulting 
figures are very confusing; they are not graphic. 



2. EXAMPLES OP STATISTICAL DIAGEAMS IN CtTERENT USB 

Various types of diagrams illustrating discrete series are 
given in the following pages. Because of the lack of space and 
the fact that the discussion does not purport to be a treatise 
on diagrammatic presentation, only a few kinds are intro- 
duced. The interested reader may consult with profit the 
books which deal more fully with this topic, reference to which 
will be found at the close of this chapter 
Figure 22 shows the relative prices of a number of farm 
products by years, the articles being distinct and the prices 
unrelated to each other. Separate bars properly illustrate the 
respective relative prices. To have connected them by lines 
would have given an incorrect impression; it would have made 
it appear that the relative heights were in some way dependent 
upon each other. The diagram, moreover, shows a break in 
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the time units in which the prices are shown. Data for the 
years 1915 to 1919, inclusive, are missing. Accordingly, atten- 
tion is called to this fact by the white area between 1914 and 
1920. Figure 22 illustrates a discrete series in time. 


FIGURE 22 

Diagram Showing Discrete Time Series 



Figure 23 shows a discrete space series, horizontal bars being 
used to show per cent changes in 1923 over 1921 The order 
of arrangement is descending. Inasmuch as the facts are dis- 
crete, the bars are distinct and evenly spaced. The “grand 
total” (in fact an average) is removed from the detail by a 
slightly wider space than that used to separate its parts. 

Figure 24 shows another discrete space series. In this dia- 
gram, the areas having an excess of exports are listed in de- 
scending order, and those having an excess of imports in 
ascending order. The total appears at the bottom of the dia- 
gram, removed from the details. 
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FIGURE 23 

Diagram Showing a Discrete Space Series 

PER CENT 


GRAND TOTAL 



FIGURE 24 

Diagram Showing a Discrete Space Series 

U.8. VISIBtX BAUNCC CF TRADC WIIX GEOGRAPHICAL DIVISIONS 
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Figure 25 shows how a discrete space series may be illus- 
trated by bars, surfaces, and volumes. Absolute and relative 
differences are much more apparent in the bars than in either 
of the other forms of illustration. Both may be verified by 
inspection when one dimension is used; when two and three 
dimensions are employed, however, they can be verified only 
by computation. The surfaces vary as the squares, and the 
volumes as the cubes of their dimensions. 

FIGURE 25 

Value op Petroleum and Natural Gas, by States, 1909 

^ (lUustrations of Lines, Surfaces, and Volumes) 


Millions of Dollars 



Figures 26 and 27 show solids drawn out of proportion, thus 
giving erroneous impressions. Such figures are meant to be 
helpful, but they are confusing and absurd. In Figure 26, 
absolute amounts for 1904 and 1914, respectively, stand in the 
relation of 51.8 to 100. The illustrations show them to be 
12.5 to 100. In Figure 27, the relation between the amounts 



190 STATISTICS AND STATISTICAL METHODS 


is 44.3 to 100; the diagrams show it to be as 6.42 to 100. 
In both cases, fortunately, the amounts accompany the 
diagrams, and the errors can be corrected. 


FIGURE 26 

Public School Property in 1904 anp 1914 

(Soli(is drawn out of Scale) 



1901 189,282,158 1914 $172,316,862 

Per Cent of Increase in 10 Years, 93% 

FIGURE 27 


Payments, Account Bonded Debt and Interest, on County 

Bonds 

(Solids drawn out of Scale) 
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A discrete condition series is shown in Figure 28, a descend- 
ing order (except for the miscellaneous item) in cost per 
employee being used. The different industries are separated 
by equal spaces, the bars being distinct. The average is placed 
at the bottom of the illustration, is removed from the detail, 
and indicated by a distinct type of shading. The diagram 
ought to have a scale and contain the amounts m tabulated 
form. 

The bar showing the cost per employee in mining is left 
jagged at the end, thus calling attention to the fact that the 
precise amount is not shown. 



AVERAGE 
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FIGURE 29 

Dugram Showing Component Parts — Discrete Time Series 

U S GOV’T INTEREST-BEARING DEBT 
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Bars are used in Figures 29 and 30 to show component parts 
in discrete time series. In Figure 29, the arrangement of the 
bars is vertical, the parts being expressed in quantities; in 
Figure 30 the arrangement is horizontal, both amounts and 
proportions being given. In both cases, since the facts are 
discrete, the bars are distinct and separate. 

The uses to which circles or pie diagrams are put in illus- 
trating component parts of a whole at a given time, relative 
proportions at different times, and different amounts and pro- 
portions at different times were discussed abo\"e. The fol- 
lowing diagrams are illustrative of those being used. 


FIGURE 31 

Pin Diagram Showing Component Parts 

The Edison T)ollar of Income 



DOLLAR OF INCOME 

•HD «l«r «AS PONE WITH IT IN 1921 


Figure 31 shows the distribution of a dollar of income re- 
ceived in 1922 by the Commonwealth Edison Company, 
Chicago, the total area of the circle being 100 per cent, and the 
different segments proportions of the total. 
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On the other hand, Figure 32 shows the proportions which 
the important items of a family budget constitute at different 
times. For purpose of distribution, the total budgets are 
shown to be equal, the areas of the circles being the same. The 
segments are proportionally but not quantitatively com- 
parable. 

FIGURE 32 

Pie Diagrams Showing Component Parts 

(Percentages of Expenditures for Major Items of Family Budget) 

IdOO-ldOS idiA-tflia 



FIGURE 33 

Pie Diagrams Showing Component Parts by Years 

N«T IMPORTS OF GOtD INTO U*S. 

Pick Coir or Total Fkom Each Cquwtiiy 


tggt 



Total $667,000,000 



ENSLAND 

France: 



GEmm 

SWEDEN 


CZU AIL OTHER 
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In Figure 33, amounts varying from year to year are shown 
by the areas of circles. Each separate amount is then divided 
into its component parts, these being indicated as proportions 
of the total. It is difficult to interpret such diagrams. For in- 
stance, the white area — “all other’^ — in 1923 is smaller than 
the corresponding area in 1921, although proportionally it is 

FIGURE 34 

Production of Petroleum, by Fields, 1909 

(Sectors of Circles and Lines) 


, Midf- 
Contment 
, Sdn 


Illinois 

Appala- 
chian 
Coastal & 

Southern 
California 
Gulf 

Lima, 

Indiana 

Other 

larger. Similar observations apply to other segments The 
different parts of the total area in any year are directly com- 
parable; the same parts in different years are not directly com- 
parable. Bars either vertically or horizontally placed bring 
out the relations much better than do circles. 

The use of bars and circles to illustrate the same facts are 
contrasted in Figure 34. 
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In some cases, bar diagrams, varying in two dimensions, are 
used to illustrate discrete facts. This is done in Figure 35, 
which shows horizontally, by the length of different bars, the 
relation of sales in April, 1922, to sales in April, 1921; and 
vertically, by the widths of the bars, the relative amotmts 
presumably sold in April, 1922.^ 


FIGIJRE 35 

Two-Dimensional Bar Diagram Showing Discrete Condition 

Series 


APRIL 


CLOTHING 


DIAMONDS 

DRUGS 


GROCERIES 


HARDWARE 

JEWELRY 

STATIONERY 

SHOES 


DRYGOODS 

rWCHINETOOl 



A similar bar chart using two dimensions is illustrated in 
Figure 36. The interesting thing about this figure is that 
absolute amounts are shown by widths of bars, lengths in all 
instances being identical and constituting 100 per cent. By 
cross-hatched surfaces not only are geographical divisions, but 

^ So far as the form of the chart is concerned, the relative amounts 
( ould be those in either period. 
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color, race, nativity, and parentage shown for the population 
of the United States. The figure admits of being read in two 
dimensions the same as a table, yet no confusion results. 

FIGURE 36 

Color or Race, Nattvitt, and Parentage, by Divisions oe the 
United States, 1910 


















EAST SOUTH CENTRAL 


dni V/ZZi, HATJVE WHITE - NATrVE PARtMTAqf 

NATIVE WHITE ^ FOREIGN OR MIXED PARENTAGE FOREIGN-BORN WHITE 

r I AU. OTHER 

Occasionally, but fortunately not often, surfaces within sur- 
faces are used to show a total and its component parts. An 
example of this atrocious practice is shown in Figure 37. In 
commenting upon this diagram, the writer using it says: “The 
large area represents the approximate annual business of 
wholesale druggists of the United States — in round numbers 
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$400,000,000. The shaded area represents the proportion of 
credits to total sales. The black area in the lower left-hand 
corner shows losses on total credits.^^ To this illuminating (?) 
statement the reader will instinctively ask: What is the pro- 
portion of credits to total sales? What proportion of total 
credits are losses? These questions cannot be answered be- 
cause (1) no data accompany the diagram, and (2) no one 
will take the trouble to compute the proportions from the 
diagram. Such illustrations are worse than useless. 

FIGURE 37 

Two-Dimensional Diagram Showing Components by Use jop 
Surfaces Within Surfaces 



The foregoing diagrams, as said above, are illustrative of 
certain types in current use. 

3 GENERAL RULES TO BE OBSERVED IN THE USE 
OF STATISTICAL DIAGRAMS 

The need for following a logical and consistent order of 
arrangement is equally as important in illustrating statistical 
facts as in tabulating them. For instance, when dealing 
with geographical distributions, where contiguity of districts 
is important, this order should be followed. Where time is a 
factor, it should control the arrangement. As a rule, less at- 
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tention is paid to the order of arrangement in illustrations 
than in tabulations because violations are not so apparent. 
False impressions are given by using an illogical order and by 
omitting all concrete data. Deception, if willed, is not difficult 
to effect. The apparent is easily confused with the real. It 
must be remembered that in the case of diagrams it is the 
eye and not necessarily the intellect to which appeal is made. 
In this fact lies the chief source of danger in the tendency to 
think exclusively in terms of illustrations. 

Diagrams of whatever type should be accompanied by the 
data which they illustrate. When this is done, the two supple- 
mmt and correct each other. The suggestive power of dia- 
grams is not interfered with, and at the same time precaution 
is taken against the tendency to place reliance in them alone. 
Moreover, the failure to include concrete data may not then 
be used as a partial justification for drawing false conclusions. 
The data not only serve as a record of the thing illustrated 
but also as a test of the accuracy of the illustration. 

When lines or bars are used, their widths generally have 
no significance. Sufficient space between them should be al- 
lowed so that they will appear distinct. It is necessary, how- 
ever, when data are classified into unequal-sized frequency 
groups to use lines of different widths. In such cases’, it is the 
surfaces and not the linear dimensions which are important. 
The widths of lines or bars will then vary with the widths 
of groups, but this will not be confusing provided the ordinate 
scales are properly indicated, and the surfaces are interpreted 
in terms both of length and breadth. To depend on abscissa 
scales alone is not sufficient. It is this error which often ex- 
plains the misinterpretation of data so grouped. An illus- 
tration of the erroneous conclusions to which people may 
be led by failing to take into account the changing sizes of 
groups is given in a recent study of the national income tax 
returns.^ This failure is common and the reader should be 


Roland P., ^Income Tax Statistics/^ Quarterly PulU^ 
c tom of the American Statistical Association, June, 1915, pp. 52a, 5B7. 
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constantly on the lookout for it Vhen he is interpreting sta- 
tistical diagrams.^ 

Confusion frequently results from including too much in a 
single diagram, the complexity of detail in whole or in part 
defeating the purpose which it is intended to serve. It is 
well to keep in mind the general rule that for diagrams to be 
effective they must be simple and easily understood. Complex 
relations can generally be more adequately shown by tables 
than by diagrams. In some cases, however, even for those 
which are relatively complex, diagrams are helpful because a 
number of comparisons can be made simultaneously. For 
those who are not accustomed to making and interpreting 
diagrams, however, it is wise to be conservative respecting the 
amount of detail which is crowded into them. There is no 
general and infallible rule respecting this matter, however, 
since much depends upon the idea which one wants to empha- 
size, the type of diagram used, the size of illustrations, the 
skill with which they are drawn, the consumers to whom they 
are addressed, etc. 

In summarizing the discussion of the use of diagrams in 
illustrating statistical facts, attention should be called to the 
appeal which such figures make to the eye, and to the ability 
which they have to make plain relations and sequences which 
in tabular form remain abstract. For instance, a hundred per 
cent becomes significant in a line of a definite length. Like- 
wise, any proportion of this amount is vividly represented by 
a line somewhat shorter than the one which represents the 
whole. Undoubtedly, when both quantities and illustrations 
are used, there results something additional to that which 
comes from using either alone. It is this something which 
has its basis in the psychological truth that the intensity with 
which a thing is perceived varies directly with the number of 
channels through which it makes its appeal. 


“ See illustration in Report No. Industrial Commission of Ohio on 
“Industrial Accidents in OMo, January 1 to June SO, 1914,” Oolumbus, 
Ohio, 1915, pp. 36-37. 
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III. Diagrams for Illustrating Frequency or Magnitude 
IN Relation to Spatial Distribution 

1. the psychological bases for the use 
OP statistical maps 

In order to show the relations between magnitude or fre- 
quency and geographical distribution, various types of 
statistical maps are used. As a class, they are known as 
cartograms. It is of interest briefly to discuss the psychologi- 
cal bases upon which their use depends, and to examine the 
different types currently employed. 

The chief function of statistical maps is to show graphi- 
cally amount or frequency in relation to position or space. 
For this purpose they are more satisfactory than tables. Data 
may be spread out geographically and amounts and frequen- 
cies studied in their relative and absolute aspects. Maps, 
moreover, are better suited for this purpose than are picto- 
grams. Comparisons can be made of magnitudes in relation 
to position. The places of absolute and relative concentra- 
tion and dispersion, together with the amount and rapidity of 
change from district to district, near and remote, are thrown 
into relief. Similar comparisons are difiScult, if not impossible, 
from tabulations alone. The order of arrangement in tabula- 
tion, even if logical and consistent, is fixed. Inspection and 
study may suggest a different order from the one chosen, but 
rearrangement is possible only by retabulation. 

The order in which data are illustrated on maps, while de- 
termined by amount or frequency — ^varying shades of color or 
density of cross-hatching, etc., indicating varying frequencies 
— is actually that of contiguity. It is, however, not inelastic. 
Comparisons may be made between remote as well as between 
contiguous districts. Similarities and differences stand out. 
They are shown not only alone and in relation to other 
amounts, but also as to positions. It is the introduction of the 
spatial concept which gives maps an advantage over tabular 
forms and simple pictograms. A new fact is represented — ^the 
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fact of position. A contiguous order may be followed in tabu- 
lation, but it lacks the concreteness which the projection upon 
a map gives it. The use of statistical maps makes it possible 
to visualize positions. 

Maps show magnitude and position in different ways, de- 
pending upon the manner in which they are drawn, and the 
nature of the data which they represent. The different types, 
with their respective merits and demerits, are discussed 
below. 

While maps are superior in many ways to tabulations, after 
all, they are secondary and simply illustrative. Classification 
of data precedes their illustration on maps. Illustration is de- 
pendent upon the order, range, and magnitude revealed 
through tabulation. In this respect, they are not different 
from pictograms. They do not stand alone. They support and 
illustrate statistical facts but do not displace them. Hence, 
they should be accompanied by concrete data, and be inter- 
preted in terms of the units of measurement in which they are 
expressed. Not infrequently, all that can be done is to sKow 
the groups into which amounts characteristic of districts fall. 
If they are wide and the amounts dissimilar, it is impossible 
even to approximate exact frequency. To guard against any 
misunderstanding of what is shown, it is essential that the data 
should accompany the map. Their presence makes less likely 
hasty generalizations from appearances, and tends to direct at- 
tention not only to the map which serves to give an impres- 
sionistic view, but also to the data themselves. In the absence 
of the facts, different schemes of illustration may suggest 
radically different superficial interpretations, since not all 
types of maps are equally well suited for all purposes. Choice 
is not a matter to be treated lightly; it is to be determined by 
the nature and distribution of the data, the size and character 
of the groups into which they fall, the number of facts to 
be illustrated, etc. Maps, like simple pictograms, are aids 
in statistical presentation, but they are not indispensable in 
statistical analysis. 
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2. TYPES OF STATISTICAL MAPS 

Statistical maps are of three general types: (1) those in 
which frequency is illustrated by different colors or by differ- 
ent shades of the same color; (2) those in which different 
shades of cross-hatching are used, the frequency or magnitude 
being indicated by relative densities; and (3) those in which 
various types of dots indicate frequency. 

(1) Colored Maps 

The cost of making colored maps makes prohibitive their 
general use. Moreover, when the groups into which data fall 
are numerous, it is often easier to show gradual changes by 
varying the shades of black and white than it is by using 
separate colors or different shades of the same color. The 
use of different colors accentuates abruptness of change from 
one condition or district to another. Where different shades 
of the same color are used, it is frequently difficult to distin- 
guish between them unless numbers or letters or some other 
identification marks are used. If color combinations are used, 
they should be complementary, the shades changing in har- 
mony with the facts represented. Lighter colors and shades 
should represent one extreme; darker colors and shades, the 
other extreme. 

On the use of colored maps, the following observation is of 
interest.^ 

“It is a cardinal principle in graphic representation that the visual 
impression should correspond directly to the facts as related to one 
another. Any scheme of color, therefore, which is not entirely logi- 
cal, in a visual sense, is worse than misleadmg when applied to 
phenomena which are to be represented m a graduated series. A 
map in which the green, red, yellow, and blue are indiscriminately 
used to represent different grades of intensity of suicide, for example, 
is fully as difficult to interpret as the statistical tables which it is 
intended to elucidate. The only opportunity for representation by 

^ Ripley, W. Z., ‘‘Notes on Map Making and Graphic Representation/^ 
Quarterly Puhlications of the Americom Statistical Association*, Vol. 6, 
1898-1899, pp. 313-327, at pp. 314-315. 
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means of unrelated colors is offered in the case of such phenomena, 
for example, as the distribution of different nationalities or religions 
within a country where no relationship in pomt of fact between the 
several elements exists. . . . 

'If colors are to be used at all, they should either be confined 
to different intensities of the same color, or else, if the number of 
shades be too great, two colors, red and blue, for example, may 
be employed, the deepest tints of each standing at the extremes of 
the series, and each shading down to an almost white color where 
the two join at the median line.^^ 

Excellent examples of colored maps may be found in the 
Statistical Atlas of the United States, published by the United 
States Census Bureau. Those who have occasion to use or 
interpret such maps should study them in relation to the 
choice of shades and colors, the varieties of uses to which they 
are put, the readiness and facility with which they may be 
interpreted, etc. 


(^) Cross-hatched Maps 

The second type of maps is that in which some form of 
cross-hatching is used to indicate amount or frequency. Figure 
38 is illustrative. Shades may range from white to black, 
extremes in the range of the thing represented being illustrated 
by extreme shades, and the condition which is more common, 
typical, or characteristic by medium shades. The number of 
shades to be used depends upon the number of groups into 
which data are divided. As in tabulation, groups should be 
of uniform size, shades representing equal ranges of units of 
measurement, rather than equal frequencies with which units 
occur. The number of times a shade is used in map making, 
as the frequency with which groups are encountered in tabula- 
tion, depends upon the total frequencies and the number of 
shades and size of the groups chosen. As widths of groups 
in frequency tables, so units of shades in maps should be uni- 
form. When this rule is followed, choice of shades is of minor 
consideration. 



FIGURE 38 

Proportion of Males 10 to 13 Years op Age Engaged in Gainfoi. Occupations, by States, 1910 

(Cross-hatched Map) 
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The foregoing discussion applies primarily to the represen- 
tation of a statistical series. Where unrelated and dissociated 
facts are illustrated, as, for instance, the number of consumers 
of a given commodity by districts, unrelated shades may be 
used. In such cases choice is determined largely by the de- 
sire clearly to contrast contiguous territories, and at the same 
time to bring out the detail necessary to the purpose in mind. 

Both color and cross-hatching schemes are restricted to data 
of a ^^discrete’^ character. Where district boundaries mark 
complete changes, the presence or the absence, or the arbi- 
trary limits to the operation of a thing illustrated, as do county 
or state lines for rates of increase of population, bankiifg 
facilities, for instance, changes from district to district appear 
abrupt and violent. Such maps give the impression that ab- 
solute uniformity prevails within districts, and that changes 
occur only between them. For instance, maps illustrating, by 
districts, the per capita sales of merchandise ; rates of changes 
in farm values or crop acreage; the average number of revenue 
passengers on street and electric railways per inhabitant, etc., 
must of necessity show uniform conditions within each district. 
Breaks appear pnly at boundaries. Division lines are prede- 
termined. Such maps are '^discrete^^ or broken. They should 
be used to illustrate only discrete series. When it is’ as nec- 
essary to show distribution by position within districts as it 
is between districts, that is, when the series illustrated are 
truly continuous, such maps give erroneous impressions. A 
more satisfactory method of illustration of both magnitude 
and frequency is then found in the so-called maps. This 
type comprises the third group spoken of above. 

(S) Dot Maps 

Upon the basis of the kind of dots used, maps may be 
divided into three classes. The jirst class is that in which the 
dots vary in size, each size having a different numerical sig- 
nificance. Such a map is shown in Figure 39. The scale, 
according to which an illustration is to be drawn, having been 



FIGURE 39 

9 

Primaby Markets for Wisconsin Cheese (American) 1911 
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determined, exact or approximate frequency is indicated in 
each division of such a map by the number and size of dots. 
The principle is different from that followed in cross-hatching 
and coloring. By the use of such dots, actual or approximate 
frequency is indicated within districts; by the use of cross- 
hatching and coloring, only group frequency is illustrated. In 
the former case, each unit of scale may be represented in each 
district; in the latter case, only one unit is so represented, the 
complete scale being shown by the entire map. The deter- 
mining factor in choice of scale, in the first case, is absolute 
frequency; in the second case, for matter arranged in series, 
it is the range of the limits of the measures to which the fre- 
quencies apply. Grouping is not provided for in the case of 
the dots and little or no knowledge of geographical distribu- 
tion is conveyed by exact magnitudes, but only by densities of 
shades which these magnitudes form. Grouping of frequencies 
is the cardinal feature of cross-hatched and colored schemes. 

As a means of graphically illustrating absolute frequency, 
such maps are of little value It is not evident by inspection, 
and to determine it it is necessary (1) to count the dots, and 
(2) to evaluate them. In this respect, the method defeats its 
own end. The process is too tedious and cumbersome. As a 
method of roughly indicating geographical distribution, they 
are suggestive, but only with respect to density of shade. In 
this particular they add nothing to the ordinary cross-hatched 
type Moreover, they may give a false impression, two- rather 
than one-dimensional figures making up the scale of values.^ 

A circle representing a shipment of cheese of 5,000,000 
pounds from Wisconsin to Illinois is not easily compared with 
one representing a shipment of 1,000,000 pounds into Missouri. 
Again, they are open to the same criticism as cross-hatching 
in that they illustrate uniform conditions within and change 
only between districts. 

The second type of dot maps, as shown in Figure 40, is 
similar to the first. Instead of using different-sized dots to 

^The merits of one- and of two-dimensional figures are treated above. 



Pig Iron Production, by States, 1^09 

(United States Census, Statistical Atlas') 
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indicate different amounts, uniform sizes are used, the dots 
being shaded to indicate different values. As a rule the 
greatest amount is represented by a solid dot, three-quarter, 
one-half, one~quarter and other shadings indicating lesser 
amounts. Notwithstanding the fact that such maps are much 
in vogue, they have little or no advantage over the cross- 
hatched type. In many respects they are less serviceable. 

The third type of dot maps, as shown in Figure 41, has cer- 
tain merits and at the same time certain limitations. The size 
of the dot is immaterial; the relative frequency with which 
it occurs is all important. Total frequency is secondary, 
though in theory it may be approximated, as in the other 
types of dot maps, by considering the number of dots in con- 
nection with the value assigned them. To approximate total 
frequency is as unnecessary as it is impossible. In most cases 
the number cannot be determined, because the dots cannot be 
identified. Moreover, the value assigned to a dot is largely 
arbitrary, since the purpose of the map is not to record ab- 
solute magnitude but to show relative abundance and scarcity 
in relation to position. The significance of the map is found 
in the relative densities of the dots in different areas. Areas 
of uniform density are not political jurisdictions, as in colored 
and cross-hatched maps, but actual positions, so far as the 
sizes of maps will allow them to be shown. 

This form of illustration gives the impression of gradual 
changes from scarceness to abundance, from ^^highs” to ^^lows.*’ 
It smooths out the breaks which prevail when cross-hatching 
is used. Geographical barriers are ignored in the drawing, but 
may be inserted for purposes of study and interpretation. It 
is easy to visualize places and degrees of concentration and 
“scatteration^^* to get a continuous view of distribution. Dot 
maps of the third type suggest continuous rather than discrete 
series. 

No attempt is made to discuss the technique of diagram and 
map construction or to enumerate the variety of uses to which 
diagrams are put by statisticians, publicists, advertisers, 



FIGURE 41 

Number of Swine on Farms and Ranges, Aprie 15, 1910 
1 Dot = 2500 
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manufacturers, etc. Numerous examples of well- and ill- 
di’awn illustrations, together with a discussion of free-hand 
and mechanical cross-hatching, the uses of pins in map mak- 
ing, preparation of copy for duplicating whether by photo- 
graphing or otherwise, etc., are given in Brinton: Graphic 
Methods for Presenting Facts} Our interest is more in de- 
scribing the functions, discovering and defining the limitations 
of diagrammatic presentation in statistical studies than in de- 
scribing the processes of drawing and reproducing diagrams, 
and in indicating their various uses. Such matters are im- 
portant but they are treated very much more fully elsewhere. 

If the reader understands the psychological bases upon 
which diagrammatic illustration rests — if he appreciates the 
position which it occupies with respect to tabulation and other 
steps in statistical analysis, and feels the warning, which it 
has been the purpose of much of the above to sound, the 
primary purpose of this discussion will have been realized. 
The making of diagrams and maps may be left to those who 
have acquired the requisite skill. The determination to use 
them should be in the hands of those who have a correct 
attitude toward their use. 

It may be helpful in closing this discussion to outline a few 
suggestions to be followed in the use of statistical diagrams. 

IV. Suggestions to be Followed in the Use of 
Statistical Diagrams 

(1) Choose illustrations which are least liable to be mis- 
understood, and which most faithfully and correctly interpret 
the facts. 

(2) See that fact and representation agree, and that all 
diagrams are provided with concise, clearly stated, and appro- 
priate titles. 

(3) Avoid figures which must be read in more than one 
dimension. 

^Brmton, Willard C., Graphic Methods for Presenting FaetSf The 
Engineering Magazine, New York, 1914, 
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(4) Indicate on diagrams the scales of values used, and 
where necessary to avoid confusion, the dimension or di- 
mensions which are significant in interpretation. 

(5) Include as a component or as an accompanying part 
of diagrams the concrete data which they illustrate. 

(6) In expressing the different parts of a total, use lines 
or bars rather than sectors of circles. 

(7) In statistical maps representing a series, divide the 
frequencies and not the number of districts or divisions into 
equal parts. 

(8) In statistical maps representing a series, incorporate 
as^a part of the legend the frequency with which the units of 
measurement occur, thus indicating the distribution by map 
and by legend.^ 
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CHAPTER VIII 


GRAPHIC PRESENTATION 
I. Introduction 

In the preceding chapter, the more common types of 
diagrams and maps were discussed in their theoretical and 
practical aspects. It was said that these illustrations — ^picto- 
grams and cartograms — are subordinate in use to tabulation, 
coming after it in point of time so far as analysis is concerned, 
and that they are particularly suited to illustrate statistical 
series which are discrete or broken. In only one type — ^the 
frequency dot map — ^is there a suggestion of continuity ; in the 
others, whether showing totals or components, the respective 
parts are distinct. 

But there is another type of series which is not discrete 
nor broken, but continuous. Series of this nature relate to 
time, to space, and to condition. Time is always continuous, 
but measurements in time may be continuous or discrete. 
Temperature measurements at hourly intervals, for instance, 
are continuous with respect both to the unit (hour) and the 
measurement (degree). Daily receipts of hogs at Chicago, 
on the other hand, constitute a series which is continuous as 
to the unit (day) but discrete as to the number (hogs) . The 
number of farm tractors by counties, for instance, is a space 
series, continuous as to the unit (county) but discrete as to 
the measurement (number). A series of words classified ac- 
cording to the numbers of letters which they contain — a con- 
dition series — is discrete both as to the unit (number of 
letters) and the measurements (numbers of instances). So, 
also, is a series showing the number of hats in a retail in- 
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rentory, classified by materials from which they are made. 
On the other hand, the number of people who purchase the 
hats, a season later, classified according to size of heads, is 
continuous as to the unit (size) and discrete as to the mea- 
surement (number). 

Now, it is continuous series with which we are concerned 
in this chapter. Diagrams are unsuited to illustrate them. 
Other means are necessary if the illustrations are to be true 
to the facts. Let us see if we can make clear what it is that 
must be illustrated in such series and the ways in which it 
may be accomplished. 

"We shall begin by taking an example of a frequency series, 
continuous as to unit, and discrete as to measurement. An 
illustration which will suffice for our purpose is the number of 
employes in a factory, classified by age. The case may be 
made simple by supposing that an even 100 men were found 
with ages as follows; 

TABLE 22 

Number of Employes in Factory ''X,” Classified by Age 


Age Groups 

Number of 
Employes 

Total 



100 

20 but less than 25 

4 

25 30 

12 

30 35 

40 

35 40 

20 

40 45 

14 

45 50 

10 


The numbers in the different age groups might be shown 
diagrammatically in a number of ways, but those in Figure 42 
are typical. 

That is, bars indicating the number in each group may be 
placed horizontally or vertically. These are the conventional 
diagrammatic types of illustrations. 
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FIGURE 42 

Bae Diagrams Showing the Number oe Employes in Factory 
“X” Classified by Age 


Number 

0 10 20 30 40 60 

20-25 
25-30 

CQ 

I 30-35 

& 

o 35-40 
4CM5 
45-60 

CKeadinars 20. but less thsnSS^ etc.) 



(Headings 20, but less than 25, etc.) 



But age is not discrete ; it is continuous. The groupings in 
the illustration are purely arbitrary^ and the numbers de- 
pendent upon this grouping. Any other groupings — narrower 
or wider, and starting at any ^‘age’^ — might be chosen. If other 
groups are selected, the number in each group will obviously 
be different. Moreover, the ages as reported, while presum- 
ably expressed to the nearest year — ^'presumably, because 
of the grouping — are simply approximations to the "true” 
age — a period susceptible of infinitesimally small gradations 
The distinct and separate bars show the ages to be discrete 
when they are in fact continuous They should be connected 
by a continuous line showing that all of the employes fall 
between the ages, 20di and 50±. 

A similar illustration will show the fundamental error in 
illustrating a continuous time series by a method suitable to 
one which is discrete. Temperature readings at successive 
hourly intervals during a day will serve our purpose. Those 
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chosen are given in Table 23. Diagrammatically, these read- 
ings might be shown by bars as in Figure 43. But such an 
illustration is not true to the facts. While the readings vary 
from hour to hour, neither the temperature nor the time inter- 
vals are discrete. Both are continuous. Bars should not be 
used to illustrate them. The case requires the use of a line 
which will show the gradual and continuous change from one 
temperature level to another. 


TABLE 23 

Temperature Measurements at Hourly Intervals, Chicago, 
, September 3, 1924 


Hours, 

Sept 3, 

1924 

Temperature — 
Degrees 
Fahrenheit 

Hours, 

Sept 3, 

1924 

Temperature — 
Degrees 
Fahrenheit 

12 Midnight 

63 

12 Noon 

68 

1 a.m. 

62 

1 p.m. 

65 

2 

62 

2 

65 

3 

60 

3 

65 

4 

59 

4 

64 

5 

58 

5 

64 

6 

58 

6 

64 

7 

57 

7 

63 

8 

58 

8 

63 

9 

61 

9 

63 

10 

64 

10 

62 

11 

65 

11 

62 


An example of a continuous space series may be treated in 
the same way. Suppose the following data were available 
showing the value of city property in dollars per front foot 
for contiguous lots in a city block: Lot 1, $20; lot 2, $15; 
lot 3, $14; lot 4, $12; lot 5, $14; lot 6, $18; lot 7, $25; and 
lot 8, $40. Such a series is continuous in fact, although as 
customarily stated, it is discrete, because of the failure to 
take account of the gradual change from foot to foot. All 
parts of a given lot are generally assigned the same value. 
Of course, if the division lines between the lots were changed, 
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FIGURE 43 

Bar Diagram Showing Hourly Temperature Readings at 
Chicago, September 3, 1924 



the values would also change. The division lines are arbitrary, 
and the values assigned to the lots depend upon the boundaries 
selected. Truly to represent such a series a continuous line 
would be preferable to a series of bars. 

The foregoing illustrations and the discussion of them are 
intended solely to show why different devices are needed to 
illustrate discrete and continuous series. Both are introduc- 
tory to the more complete discussion of Graphic Presentation 
which follows. 

II. Diagrammatic and Graphic Presentation Contrasted 

Bars, squares, cubes, circles, and similar figures themselves 
represent or stand for quantities singly or in series. Such 
illustrations are diagrammatic. On the other hand, when 
quantities are graphically illustrated, they are not represented 
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by one or more dimensional figures, but are located on a sur- 
face with respect to two or more dimensions. 

The customary method of graphically presenting statistical 
facts is to use a system of rectangular co-ordinates such 
the following: 


FIGURE 44 

A System op Co-ordinates 



The points P and P' are two facts' located in a plane, their 
positions being determined by the characteristics indicated on 
the two axes, X and Y. The junction of the axes at 0 is known 
as the point of origin; the horizontal axis is called the abscissa, 
and the vertical axis, the ordinate. All points in the plane 
are fixed with reference to these axes. The plus and minus 
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signs indicate the parts of the ^^system^’ in which positive and 
negative quantities are placed. 

Now, it is evident that to locate quantities or frequencies 
in a plane bounded by two axes in the above fashion is not 
the same as to represent them by lines, bars, surfaces, solids, 
etc. — devices which themselves are drawn proportional to the 
amounts or frequencies involved. One would surely not locate 
quantities or frequencies with respect to these two axes and 
at the same time represent them by figures of various dimen- 
sions. A strange figure, indeed, would be secured if in place 
of the points P and P' squares or cubes were inserted. If this 
were done, the co-ordinate axes would have neither use nor 
meaning. Indeed, it is the function of the ordinate axis to 
indicate quantities or frequencies, and of the abscissa axis 
to locate them with respect to time, space, or condition.^ 

In graphic presentation, a system of co-ordinates, such as 
the above is used; in diagrammatic presentation, the co- 
ordinates are replaced by the illustrations themselves. 

All truly continuous series are properly illustrated by 
graphical as distinct from diagrammatic methods. Such series, 
to repeat, may be measured in time or in space or be repre- 
sented by frequencies of a variable at the same time or place. 
Since time and frequency series are more commonly encoun- 
tered in statistical study, they are given primary attention. 
Let us then begin the study of graphical presentation by con- 
sidering frequency series. 

Time, space, and condition series are contrasted in the chap- 
ter on Classification — Tabulation.^ We were there concerned 
with the manner in which variables in frequency series should 
be grouped for purpose of tabulation. The problem we found 

IS not correct, therefore, to say that "all statutical diagrams (?) 
are representations of potnis, lines, surfaces, or solids, the position of 
which in space are quantitativelv defined hy a system of co-ordinates" 
Pearl, Raymond, Introduction to Me^al Biometry and Statistics, W. B. 
Saunders Company, Philadelphia, 1923, p. 105, italics, the author’s. 

^ Supra, pp. 157-169. 
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to be different, depending upon whether such series were dis- 
crete or continuous. A similar problem occurs when fre- 
quency series are graphically illustrated. It is necessary to 
know to which type a series belongs before illustrating it. 

III. Gbaphic Presentation of Frequency Series 

1. plotting simple frequency series 

Graphically to present statistical facts, two dimensions are 
used as shown in Figure 44. The horizontal or abscissa axis 
is used for the measurements, and the vertical or ordinate avis 
for the frequencies. In any particular case, in order not to 
over-emphasize the extreme frequencies and at the same time 
to dwarf the minor ones, it is necessary, before deciding upon 
the vertical scale, to study the range covered by the frequen- 
cies. Similar observations apply to the horizontal axis. If it 
is divided into units which are too small, the frequencies will 
be too widely dispersed; if in units which are too large, they 
will appear crowded. The respective scales will obviously be 
different for each series of data. There is no absolute standard 
suitable to all cases, yet, as a general^ rule, it is desirable to 
have the horizontal approximately 1^ times as long as the 
vertical axis. Experience in scale adjustment is the best 
teacher, however, and a keen sense of form and appearance is 
helpful while gaining this experience. 

Equal distances on either scale should represent equal 
facts.^ The scales should be divided into units which are 
easily comprehended in terms of the rulings of the paper used. 

If paper is ruled in fifths or tenths, for instance, the unit of 
space on the ordinate should be capable of being readily re- 
duced to this basis. Ten small squares should never be made 

" On the necessity of having a horizontal as well as a vertical zero 
base Ime_, see Clark, Earle, “The Horizontal Zero in Frequency Dia- 
grams, ’ in Ouwrterly PulUcations of the Amencan Statistical Assooia- 
non, June, 1917, pp. 662-669. This article is reprinted in the author’s 
KeoAij^s and, Problems in Statistical Methods, Macmillan & Company, 
New York, 1920, pp. 385-394. 
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to equal such an amount as 3,333, A given space should equal 
some multiple of ten, as 4000, 5000, 6000, etc. The ordinate 
should be labeled in terms of the scale unit and not in terms 
of the successive frequencies which are plotted. Exact fre- 
quencies may be inserted opposite the measurements to which 
they apply if they do not crowd the graph. It is well to place 
them horizontally at the top of the sheet on which the curve 
is drawn. 

The abscissa scale should likewise be divided into equal 
parts. If for any reason successive units are omitted, given 
in greater detail, or grouped irregularly, these facts should be 
plainly indicated by subdividing or widening the unit interval. 
Under no circumstances should one be left in doubt as to the 
precise units to which frequencies apply. Uniformity in the 
size of frequency groups is even more necessary in graphic 
figures than in tabulation, because an unbroken continuity is 
more likely to be assumed in the former than in the latter case. 

(1) Platting Simple Frequency Distributions of 
Discrete Series 

The thought was developed above^ that continuous series 
cannot properly be illustrated by diagrams. They are de- 
signed for those which are discrete. The reverse is equally as 
true; discrete series cannot properly be illustrated by methods 
which are suitable to continuous series. Yet, in the case of 
frequency series which are discrete, continuous lines rather 
than distinct bars are so commonly (but incorrectly) used 
that it seems necessary to discuss the problem in detail. 

Measurements in discrete series, by custom or otherwise, 
are expressed in the units in which the thing measured is 
shown. Many illustrations of such series have already been 
given. When they are graphically presented, the units on 
the abscissa axis do not represent approximations to exact 
measurements which it is impossible to determine because of 

^Pp. 215-218. 
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the limitations of science, or because all possible measure- 
ments are likely to occur within the limits set up. They rep- 
resent actual measurements. The units on the abscissa axis 
assigned to them, therefore, can rarely be accurately repre- 
sented by spaces. They are almost always points or positions. 

Moreover, if lines are used to connect the ordinates, they 
are meaningless. It is true that they aid the eye in compar- 
ing the respective heights of the ordinates, but beyond this 
they serve no purpose. They show a trend of frequencies at 
the positions at which they occur but they do not indicate the 
likely or probable frequencies at every point on the horizontal 
axis, as would be the case with a line describing a continuous 
series. 

This can be made clear by means of examples. A recent 
study showed that certain proposed freight rates per one hun- 
dred pounds from St. Paul and Minneapolis to Sioux City, 
Iowa, were expressed in amounts ending in integers as 
follows: 


TABLE 24 

Proposed Freight Rates per 100 Pounds Between St. Paul, Min- 
neapolis, AND Sioux City, Iowa, Ending in Different Integers 


Integers 

Number of Rates 

0 

6 

1 

3 

2 

7 

3 

4 

4 

6 

5 

4 

6 

6 

7 

4 

8 

6 

9 

4 


Suppose this frequency series were graphically illustrated 
by a continuous line running from zero to nine, inclusive. It 
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would then appear that something more than four and some- 
thing less than six cases, for instance, occurred between 
amounts ending with integers of three and four. But such an 
inference would be absurd There are no integers between 
three and four. Accordingly, separate bars, rather than a 
continuous line, should be used to illustrate this series.^ 

Table 26 on page 226 shows the number of employes in mer- 
cantile establishments classified according to rates of wages 
received. This’, obviously, is a discrete series. While weekly 
wage-rates other than those actually named might have been 
paid, there is no basis for assuming that the difference in fre- 
quencies between 254 and 4, for $6.00 and $6.50, respectively, 
are evenly distributed between these two amounts, or that 
there are any persons’ whatsoever who receive $6,399, for in- 
stance. A continuous line connecting the different ordinates 
m such a case as this may serve to emphasize the difference, 
but it does not establish the distribution between them. 

It is customary in illustrating discrete series to represent 
group-widths by spaces on the abscissa axis, to erect ordi- 
nates at their middle points, and to connect them by con- 
tinuous lines. This practice is bad, because it makes it appear 
that there is either an equal distribution of the instances 
throughout the groups, or that they are all concentrated at 
their centers. In most cases, neither condition obtains. There 
is no necessity that such a distribution should hold for a dis- 
crete series.® Indeed, any grouping at all for such series is 

^ In an analogous ease, The Bureau of Railway Economics, in plotting 
the “Monthly Revenues and Expenses per Mile of Line” for the rail- 
roads in the United States having operating revenues above $1,000,000, 
says, “The points on the vertical lines are of significance only in show- 
ing the condition for the particular month. The lines connecting the 
points assist in tracing the change from month to month but do not 
indicate the trend during the month, nor do they represent cumulative 
figures for the period.” “Revenues and Expenses of Steam Roads in 
the United States, December, 1915,” Bureau of Railway Economics^ 
Washington, D. 0. 

= It is known, for instance, that wage-rates are generally fixed in 
round numbers, concentration appearing on 5, and its multiples. See 
Table 25, and the following distribution. 



GRAPHIC PRESENTATION 


225 


likely to be misleading. If possible, each measurement should 
be separately indicated. This, of course, is impossible in 
many cases: some grouping must be used. A graphic figure, 
however, should, so far as possible, faithfully represent the 
facts as they are. It should never imply a distribution which 
does not exist. If it is an error to connect by straight lines 
ordinates representing frequencies in discrete series — because 
of implications as to distribution — ^it is a far greater error to 
connect them by smoothed lines. If series are discrete, it is 
this very characteristic which should be retained: false ac- 
curacy is implied when a smoothed line is used. Only when 
such a line gives an accurate notion of direction at, and change 
between successive measures should it be used. It should not 
be employed as a means of generalizing as to distributions at 
measures not represented. 

It is doubtful if the distribution of interest rates on real 
estate mortgages, for instance, as shown in Chapter V,^ would 
have been materially altered by extending the study over a 
longer period of time, or by including more instances. Smooth- 
ing such curves results in deception. Smoothing may he em- 
ployed to remove errors in observation but not to disguise the 
truth. The extent to which it does the latter varies directly, 

(Note 2, continued) 

Table showing the number of union bricklayers receiving specified 
hourly wage-rates in New York State. (Compiled from the New York 
Department of Labor Bulletin, Whole No. 65, 1913, pp. 4-C) 


Gents pkr Horn 

Number 

Per Cent Distribution 

Total 

13,362 

100.00 

50 

496 

3.71 

55 

489 1 

3.66 

60 

1,650 1 

12.35 

65 

2,391 1 

17.89 

70 

7,404 ' 

55.42 

All other 

932 

6.97 


* Page 164. 
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TABLE 26 

Table Showing the Number of Females anb Minors Employed 
IN 24 Mercantile Establishments in September, 1913, Re- 
ceiving Classified Wage-Rates 

(“Minimum Wage Legislation in the United States and Foreign 
Countries ” — Bulletin of the United States Bureau of Labor 
Statistics — Whole Number 167, April, 1915, p. 96) 


Weekly 

Wage-Rates 

Number op 
Females and 
Minors Re- 
ceiving Specified 
Wages 

Weekly Wage- 
Rates 

Number op 
Females and 
Minors Re- 
CEiviNG Specified 
Wages 

Total 

3,189 



$3.00 

20 

$1400 1 

60 

3.50 

— 

14.50 

2 

4 00 


15.00 

164* 

4.50 

18 

15 50 

2 

5.00 

72 

16 00 

27* 

5.50 

2 

16 50 

15 

6.00 

254* 

17 00 

14 

6 50 

4 

17.50 

26 

7.00 

311* 

18 00 

65* 

7.50 

48 

18.50 

4 

8.00 


1900 

5 

8.50 

44 

19.50 

4 

9.00 

441* 

20.00 

57* 

9.60 

4 

— 

— 

10.00 


21.00 

3 

10.60 

13 

22 00 

23 

11.00 

72* 

— 

— 

11.50 

8 

25.00 

37* 

12.00 

355* 

27.50 

7 

12.50 

16 

30.00 

9 

13.00 

22 

— 

— 

13.50 

37 

35.00 

9 



Over 35 00 

5 


Notice the concentration on even dollar amounts. 











GEAPHIC PRESENTATION 


227 


for discrete series, with the degree of irregularity characteris- 
tic of the thing measured and with the widths of the groups 
into which frequencies are placed. 

This discussion, however, is in fact out of place. Discrete 
series of the frequency type should be illustrated by diagrams 
— discrete figures. The subject is discussed in this chapter 
only because this fact is so often forgotten or ignored. Con- 
tinuous lines — straight and smoothed — and bar diagrams are 
used indiscriminately to illustrate both continuous and dis- 
crete series. Both principle and consistency are sadly lacking 
in these respects. But they ought not to be. 

(£) Plotting Simple Frequency Distributions Describing 
Continuous Series 

When plotting continuous frequency series, the case is dif- 
ferent The units of measurement are arbitrary, the frequen- 
cies being functions of those selected. Accordingly, the 
abscissa axis, properly considered, is continuous. The breaks 
in it are made for convenience only: they indicate convention- 
ally “stops, as it were. They are artificial. If this is so, 
then, the ordinates indicating the frequencies at these “stops” 
should be connected by smooth lines which suggest continuity 
in the thing measured. To regard the measurements actually 
made as’ fully descriptive of such a series, is as incorrect as it is 
to assume, in the case of discrete series, that instances occur at 
all possible measurements. Neither is correct. One type of 
illustration fits a continuous, the other a discrete, series. 

In continuous series, since variations from one extreme 
measurement to the other are regular and gradual, not only 
should the ordinates be connected, but the direction of the 
line joining them should be determined by the frequencies at 
successive and at all measures. Such a curve should be free 
from sharp angles, the contour being influenced at each point 
by the relative sizes of adjoining frequencies and by the char- 
acter of the complete distribution. 
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Let us take a continuous frequency series and see how it 
would be correctly illustrated graphically. For this purpose 
the measurements of the lengths of 327 ears of corn taken at 
random from a homogeneous “population’’ may be selected.^ 
Measurements are made to the nearest quarter of an inch 
and grouped into one-half inch classes. The following table 
shows the number of ears falling into the half-inch groups. 

TABLE 26 

Table Showing the Number of Ears of Corn Classified by 

Lengths 


Length of Ears of Corn in Inches 

Number op Ears at Each Length 

Total 

327 

30 

1 

3.5 

0 

4.0 

1 

4.5 

0 

5.0 

2 

5.5 

3 

6.0 

9 

6.5 

8 

7.0 

12 

7.5 

19 

8.0 

32 

8.5 

40 

90 

67 

9.5 

63 

10.0 

38 

10.5 

21 

11.0 

8 

11.5 

2 

12.0 

1 


The precision of the measurements and the widths of the 
groups determine the number of ears in each class. If the 

^ Bata taken from Davenport, Eugene, and Rietz, Henry L., *‘Type and 
Variability in Corn,” Bulletin 1J9^ XJniv&rHtv of IlUnoifi Agricultural 
Bxpemnent Station, October, 1907, p. 3, 
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measurements had been made to the nearest tenth of an inch 
and grouped into quarter-inch classes, as 4.00, 4.25, 4,50, 4.75, 
S.OO, 5.25, 5.50, 5.75, 6.00, 6.25, etc,, then ‘^at 5.75 would be 
grouped all ears which measured 5.7 and 5.8, while at 5.00 
would be grouped those which measured 4.9, 5.0, and 5.1. In 
the long run, this would clearly result in placing more ears at 
5.0 than at 5.25, other things being equal. If we should group 
measurements taken to the nearest tenth inch in 0.5 inch or 0.3 
inch classes, no such difficulty arises.^^^ 

With the grouping shown m Table 26, it is absurd to assume 
that since 40 ears are grouped at 8.5 inches, and 67 at .9.0 
in length, there were no ears with lengths between these 
measurements. Had they been more precise and the group- 
ings narrower — ^thus giving a different distribution from that 
shown in the table — each measurement would still have been 
an approximation to the “true’^ length, and the grouping 
arbitrary. The unit of measurement is strictly continuous — 
any break in it is artificial. 

But the ears measured are only a sample of a wider ^^uni- 
verse.’^ Would the case be different if more cases were taken? 
Not at all. There would still be the problem of determining 
the length of each ear, and for this purpose an approxima- 
tion — ^no matter how precise — ^would have to be made. Length 
is continuous, and merely increasing the number of cases in 
which it must be determined does not alter the fact that each 
measurement of length is an approximation. 

In order to illustrate graphically the number of ears at 
each of the lengths shown in the groups in Table 26, a con- 
tinuous, smooth line from ordinate to ordinate should be used. 
The case in this respect would be no different if the sample 
were enlarged. 

The degree to which continuous frequency series may be 
smoothed depends upon the nature of the distributions. If 
measurements are accurately made — ^bias due to personal and 

(it, p, 28 . 
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mechanical elements being absent or distributed according to 
chance — large deviations from a standard will be less com- 
mon than small ones, the measurements tending to be ar- 
ranged around an average or norm. This is the case with 
distributions approaching the ^'normal law of error” type.^ 
According to this “law,” the measurements of phenomena are 
distributed about their averages in a regular and systematic 
manner, when the number observed is large, and when each 
measurement results from a large number of independent 
causes, none of which is of preponderating importance. A 
graphic figure of such a distribution is bell-shaped in form, 
the precise form being dependent upon the degree to which 
chance operates, and upon the number of measurements made. 

The measurements of the lengths of a sufficient number of 
ears of corn would tend to give such a distribution. Indeed, 
it tends to be characteristic of the measurements of all natural 
phenomena. Accordingly, in smoothing distributions of this 
type, account should be taken of the tendency for frequencies, 
as they approach the maximum ordinate or most common 
measurement, to pile up at the upper sides, and as they recede 
from the maximum, to pile up at the lower sides, of the groups 
into which they are placed. Allowance should be made for 
this tendency in smoothing the distributions of the measure- 
ments of a sample, as well as in generalizing as to the distri- 
bution of an entire “population.” 

In the illustration of the lengths of ears selected, 240 cases 
occur in the groups 8.0 to 10.0 inches. The greatest num- 
ber — 67 — is found at 9.0 inches. At the one-half inch be- 
low, there are 40, and at the one-half inch above, 63 cases. 
That is, the distribution is unequally balanced near the maxi- 
mum and “tails off” more below than above it. It is not 
strictly of the normal type. If more ears were included in 
the sample, the form of distribution would appear more regu- 
lar. Accordingly, in smoothing the curve to take account of 


See Chapter XI, pp. 367-370 
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this fact, a continuous line should be drawn near to but not 
at the various points in the distribution. The curve used to 
smooth the sample measurements should be rounded out as 
the larger frequencies are approached and inclined toward the 
vertical as they fall off. The smooth curve in Figure 45 is 
intended to the sample and not to generalize the distri- 
bution of an ideal curve relating to such measurements. 

FIGURE 45 

Smoothed Frequency Distribution of Lengths op Ears op Corn 


^ (Frequency Distribution, Continuous Senes) 

Kumber 
of Ears 



In any continuous series, as the class intervals into which 
measurements are grouped are made smaller, and as the ac- 
curacy of measurement is made more precise — ^the number of 
observations being large — ^the lines drawn from successive or- 
dinates appear smooth and regular. On the other hand, if 
the observations are few, or, if the groups into which they are 
placed are chosen without regard to the distribution in normal 
curves, then the lines connecting successive ordinates have a 
step-like, halting appearance, foreign to continuous series. In 
grouping data of the continuous type, the ^^classes should be 
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only just broad enough to make the distribution fairly smooth, 
that is, there should be no vacant classes except very near the 
extremes of the range, and a gradual increase from one ex- 
treme up to the maximum and then a gradual decrease to the 
other extreme, if there is only one maximum in the distribu- 
tion as is, in general, the case with these populations.”^ A 
smoothed curve serves the purpose of idealizing such a group- 
ing in keeping with the normal type of distribution. Any 
pronounced tendency of distribution in a continuous series, 
shown by a fair and adequate number of samples, will tend to 
be confirmed if more are taken. On the other hand, if only 
a few are studied and the resulting curve tends to be very 
irregular, it is likely that further sampling will give a more 
characteristic tone to the distribution, making less pronounced 
both the exceptionally large and small frequencies. Whether 
a smoothed curve should exaggerate or minimize the peculiar 
properties of a distribution depends upon how accurately the 
samples characterize the complete series.^ 

How fully this is done by any series of samples is not always 
evident. While some smoothing is always admissible for con- 
tinuous series, smoothed curves should not be used indiscrimi- 
nately in place of the original data. The measurements of 
the samples and the frequencies with which they occur often 
serve as the best available approximation to the ideal which it 
is the purpose of the smoothed curve to give, 

% 

2. PLOTTING CUMULATIVE FKEQUENCY SEKEBS 

The foregoing discussion of graphic representation has had 
to do with simple frequency series: that is, series in which the 

^Davenport, Eugrene, and Rietz, Henry L., op. cit., p. 27. 

^To the rule “that the top of the curve usually overtops the highest 
point of the frequency polygon, especially when the classes are rather 
iarge*^ (King, W. I., Elements of StaUstical Method, Macmillan & Com- 
pany, New York, 1912, p, 113), the criticism is pertinent that the deter- 
mining factor is not so much the size of the groups as it is the repre- 
sentative character of the samples. 



GRAPHIC PRESENTATION 


m 


numbers of instances refer to the respective measurements or 
to the groups into which they are placed. But the frequencies 
may be cumulated: that is, added together, the effect of this 
being to include together successive measurements or groups 
as the case may be. Each frequency class, therefore, is made 
to include all of the lower or all of the upper classes, depending 
upon the manner in which the cumulating is done. It may be 
begun with either extreme measurement, the only essential 
being, if all cases are to be included, that it be carried through 
the entire range of frequencies. If it proceeds from the least 
to the greatest, the frequencies at each Step are read “less 
than^^; if from the greatest to the least, “more than.” It will 
be noticed that the cumulations when read “less than” refer 
to the upper limits, and when read “more than,” to the lower 
limits of the respective groups. This method of stating the 
frequencies is used in Table 29. 

Both discrete and continuous series may be cumulated and 
the resulting frequencies graphically illustrated. The way in 
which the cumulating is done is the same in both series but 
the graphic representations are different. The following dis- 
cussion will serve to make this clear. 

(Jf) “Gmp/iic” Representation of Discrete Frequency Series 
Cumulated 

The discrete series in Table 25, p. 226, may be cumulated on 
a “less than” or on a “more than” basis. Within the limits set 
by the simple series, any grouping desired may be used. Dif- 
ferent methods of cumulating are shown in Tables 27 and 28. 

A system of rectangular co-ordinates, as shown in Figure 44, 
is used to illustrate cumulative as well as simple frequency 
distributions. The groups are measured on the abscissa or 
X axis, and the frequencies, on the ordinate or Y axis, equal 
distances on either axis always representing equal quantities 
as in the case of simple frequency series. When the succes- 
sive groups are indicated from left to right along the X axis, 
the frequencies cumulated on a “less than” basis tend to in- 
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crease, successive intervals including all of the frequencies 
which belong to the lower classes as well as those at a given 
position. When they are cumulated on a ^^more than” basis, 
the frequencies from left to right tend to decrease, successive 
intervals including only the remaining frequencies as well as 
those in the class in question. 

TABLE 27 

Cumulations of Weekly Wage-Rates on a ''Less than” Basis 
(Simple Frequency Series, p 226) 

''A” "B” 


Weekly Wage- 
Rate Groups 

CUMULATBP 

Frequencies 

Total 

3189 

Less than $ 5.00 

88 

" " 10.00 

1758 

" “ 15.00 

2713 

" " 20.00 

3039 

" " 25.00 

3122 

" " 30 00 

3166 

" " 35.00 

3175 

" " 40.00^ 

3189 


Weekly Wage- 
Rate Groups 

^ 

Cumulated 

Frequencies 

Total 

3189 

Less than $ 8.00 

779 

" " 1600 

2879 

" " 24.00 

3122 

" " 32.00 

3175 

" " 40.00 * 

3189 


* Limit arbitrarily taken. 


TABLE 28 

Cumulations op Weekly Wage-Rates on a "Moee than” Basis 
(Simple Frequency Series, Table 25, p 226) 


"A” 


Weekly Wage- 
Rate Groups 

Cumulated 

Frequencies 

Total 

3189 

More than $20.00 

93 

" " 15.00 

312 

" " 10.00 


" " 5.00 


" " 0.00 

3189 


"B” 


Weekly W’^agb- ; 
Rate Groups 

Cumulated 

Frequencies 

Total 

3189 

More than $22.00 

67 

" " 18.00 

163 

" " 14.00 

478 



" " 6.00 

2773 


3189 
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To combine the frequencies by successively widening the 
groups does not change the fundamental nature of truly dis- 
crete series. The frequencies, whether expressed in simple or 
cumulated form, are distinct at each measure encountered. 
Accordingly, a continuous line, whether irregular or smoothed, 
ought not to be used to illustrate them. Successive accumula- 
tions should be indicated by separate bars located at the 
abscissa units. For instance, the cumulations on a ^^less than^' 
basis, as shown in part ^^A” of Table 27, would appear as in 
Figure 46, 

' FIGURE 46 

Bar Diagram Showing a Discrete Frequency Series Cumulated 
ON A ^^Less Than” Basis 



Weekly Wage-Rates 


On the other hand, if the series as cumulated in part “A” 
in Table 28 — ^that is, on a “more than” basis — ^were illustrated, 
the figure would appear as in Figure 47. 
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FIGURE 47 

Bar Diagram Showing a Discrete Freqxtency Series Cumulated 
ON A ''More Than” Basis 



It would be absurd to connect the successive bars in this 
or the preceding illustration by irregular or by smoothed lines 
because nothing is known — ^beyond the information contained 
in the more narrowly grouped simple frequency series — about 
the wage-rates between the different groups. The series, how- 
soever grouped, is still discrete and it should not be made to 
appear continuous. 

If cumulations were made at precise amounts, as, for in- 
stance, those in Table 25, the successive ordinates should be 
drawn at intervals so marked on the abscissa axis. More- 
over, they should not be connected in any way. The amounts 
are discrete and they should be so represented. 

So much for the representation of discrete series. In what 
way is graphic illustration different in the case of series which 
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are cantinuous? This question is answered in the following 
section. 

(f) Graphic Representation of Contiamom Frequency 
Series Cumulated 

Frequency series may be continuous as to the unit of mea- 
surement and discrete as to the frequencies. Let us take an 
example of such a series and discuss its graphic representation 
when the instances are cumulated. 

Table 29 shows the number of towns in the United States 
cjassified according to the prices paid for oil in 1904. The 
unit (price) is in fact continuous, although as customarily 
stated it is discrete. In this case, we shall consider it to be 
continuous. For purposes of illustration, one-tenth part of a 
cent is taken as a convenient, although arbitrary, division. 
The frequencies, however, are discrete, numbers of instances 
being used. 

The second column of Table 29 shows a simple frequency 
distribution of the towns classified according to prices paid. 
Columns three and four, respectively, show the frequencies 
cumulated on a ^4ess than” and on a ^^more than” basis. 
Cumulative graphs or ogives of the series are shown in Figure 
48. The direction of the ^^less than” curve is from the lower 
left-hand to the upper right-hand corner; and that of the 
“more than,” from the upper left- to the lower right-hand 
comer of the figure. 

As the cumulations are made in Table 29, and as they 
should be read on the curve, the frequencies which are ex- 
pressed on a “less than” basis always refer to the upper sides, 
and those on a “more than” basis to the lower sides of the 
groups. For instance, the number of towns where prices are 
10 cents or less is 914; the number, in which they are more 
than 10 cents is 916. 

In graphically illustrating this series, the respective ordi- 
nates showing the number of towns are connected by straight 
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TABLE 29 

Table Showing the Distribution op Towns According to Prices 
Paid for Oil, Freight Deducted (1830 Quotations), Decem- 
ber, 1904, FOR THE United States 

{Report of the Commissioner of Corporations on the Petroleum In- 
dustry, Part II, Aug. 5, 1907, p. 951) 


Price, Less Freight 
(Gents per gallon) 

Number op Towns in the United States 

Simple 

Frequency 

Cumulative 

Frequency 

^‘Less than*' 

“More than’' 

Total 

1,830 

— 

— 

6.0 to and including 6.5 

11 

11 

1,830 

6.6 to and including 7.0 

17 

28 

1,819 

7.1 to and including 7.5 

27 

55 

1,802 

7.6 to and including 8.0 

36 

91 

1,775 

8.1 to and including 85 

123 

214 

1,739 

8.6 to and including 9 0 

181 

395 

1,616 

9.1 to and including 9.5 

281 

676 

1,435 

9.6 to and including 100 

238 

914 

1,154 

10.1 to and including 10 5 

201 

1,115 

916 

10.6 to and including 11 0 

162 

1,277 

715 

11.1 to and including 11 5 

130 

1,407 

553 

11.6 to and including 12 0 

85 

1,492 

423 

121 to and including 12 5 

65 

1,557 

338 

12.6 to and including 13 0 

49 

1,606 

275 

13.1 to and including 13.5 

26 

1,632 

224 

13.6 to and including 14.0 

19 

1,651 

198 

14.1 to and including 14.5 

43 

1,694 

179 

14.6 to and including 15.0 

38 

1,732 

136 

15.1 to and including 15.5 

23 

1,755 

98 

15.6 to and including 16.0 

12 

1,767 

75 

16.1 to and including 16 5 

13 

1,780 

63 

16.6 to and including 17.0 

20 

1,800 

60 

17.1 to and including 17.5 

8 

1,808 

30 

17.6 to and including 18.0 

7 

1,815 

22 

18.1 to and including 18.5 

6 

1,821 

15 

18.6 to and including 19.0 

4 

1,825 

9 

191 to and including 19.5 

1 

1,826 

5 
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Price, Less Freight 
(C ents per gallon) 

Number op Towks is the United States 

Simple 

Frequency 

Cumulative 

Frequency 

“Less than” 

“More than” 

19.6 to and including 20 0 

— 

— 

— 

20 1 to and including 20 5 

— 

— 

— 

20 6 to and including 21.0 

— 

— 

— 

21 1 to and including 21.5 

— 

— 

— 

21.6 to and including 22.0 

— 

— 

— 

22.1 to and including 22.5 

— 

— 

— 

22.6 to and including 23.0 

1 

1,827 

4 

23 1 to and including 23.5 

3 


3 


FIGURE 48 

Cumulative Graphs — Ogives — Constructed on ''More Than'' and 
"Less Than" Bases, Showing by Towns the Classified 
Prices op Oil 
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but irregular lines. This is admissible because from ordinate 
to ordinate the price differences are gradual, each amount be- 
ing only an approximation (in this instance to the nearest 
tenth of a cent) to the “true” price Such lines represent the 
gradual changes, but they do not idealize them as would a 
smoothed curve, designed to “fit” the distribution. The con- 
tinuous lines are intended to illustrate the measurements in 
this particular sample, rather than to generalize from it as 
to the nature of the distribution from a total “population” of 
this sort. 

The frequencies in a continuous series may be indicated as 
relating to a precise measurement. This is done in the ex^ 
ample showing the number of ears of corn of different lengths. 

Each measurement, as was said, is only an approximation 
to the “true” length. The case is the same in the following 
example showing the lengths of time in 61 trials which it takes 
a mechanic to “thread” a standard bolt. The number of fre- 


TABLE 30 

Lengths of Time Taken to 'Thread'' a Standard Bolt 
(Measurement to nearest quarter of a minute) 


Minutes 

Pebqubncies 

Simple 

Cumulated 
*‘Less than” 

Total 

61 

— - 

5y4 

2 


51/2 

3 

5 

5 % 

5 

10 

6 

6 

16 

6% 

8 

24 

6% 

12 

36 

6% 

9 

45 

7 

7 

52 

7y4 

4 

56 

7y2 

3 

59 

7 % 

2 

61 








GRAPHIC PRESENTATION 


241 


quencies at each time — approximations to the nearest quarter 
of a minute — are given in Table 30. 

Suppose it were desired graphically to illustrate the 
cumulated frequencies at successive intervals. Since time is 
continuous, the frequencies in reality have reference not to 
the measurements as stated but to approximations to them. 
Accordingly, account should be taken of this fact in the 
graphic figure. The way in which it is done is illustrated in 
Figure 49, and may be described as follows: 

At each successive interval on the abscissa axis, the number 
of frequencies is indicated by dots according to the scale pro- 
vided on the ordinate. Beginning at the shortest time, 5%, 
minutes, two dots equally spaced are inserted. With these as 
the total for this period, three dots for the second interval, 
5)4 minutes, are added with the upper dot for the preceding 
time unit as a base. This process is continued until the fre- 

FIGURE 49 

Cumulative Graph of a Continuous Frequency Series Showing 
Length of Time Taken to 'Threao'' a Standard Bolt 

(Basis of Cumulation — “Less Than”) 
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quencies at the different positions are inserted. A continuous 
line is then drawn through the middle points of the consecutive 
vertical rows of dots. It is this line which properly represents 
the cumulations. This follows because (1) not all of the fre- 
quencies assigned to the respective measurements actually fall 
upon them — ^they fall ^^around^^ them, and (2) there are prob- 
ably as many measurements in excess as in defect of the 
approximate time, the number of instances being uniformly 
distributed over a quarter of a minute. Accordingly, if the 
dots, each of which represents an approximate period, are 
supposed to lie upon the continuous line rather than to have a 
vertical position, the continuity of the series is illustrated. 
The positions which they would then assume are indicated on 
the figure by the small arrows. 

Continuous straight lines connecting the middle points of 
the different ordinates properly illustrate the nature of the 
cumulation in this sample. If, however, it were taken to 
characterize a ^^population” of this sort, the connecting line 
should be smooth and free from all angles. 

From the foregoing discussion, it should be apparent that 
the methods of graphically illustrating simple and cumulated 
discrete and continuous frequency series are fundamentally 
different. Choice of methods depends upon the nature of the 
series. No careful student will select methods purely at 
random. The requirements in each case are different and these 
should be observed. Graphic figures should not only be ac- 
curately drawn but selected according to their appropriateness. 
To make such selection calls for more than mere cleverness 
and ability to draw. 

IV. Graphic Presentation op Historical or 
Time Series^ 

The ways in which discrete time series should be illustrated 
are discussed in Chapter VII under the heading Diagrammatic 

’^See also CJliapter XIV, pasHm . 
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'Presentation. Time, of course, is continuous, but as has been 
said in a number of places, measurements in time may be dis- 
crete or continuous. Those of the first type should be indi- 
cated by vertical or horizontal bars; those of the second, by 
unbroken lines. 

In graphically presenting continuous time series, a number 
of problems present themselves. These have to do with 
(1) choice and adjustment of scales, (2) the type of lines 
connecting successive ordinates, and (3) curve smoothing. In 
keeping with the outline plan of treatment of frequency series, 
simple and cumulative historical curves or graphs will be 
discussed separately. 

1. PLOTTING SIMPLE HISTORICAL SERIES 

Simple historical series are those in which amounts or fre- 
quencies relate to instants or intervals of time. Cumulated 
historical series, on the other hand, are those in which amounts 
or frequencies are totaled at successive instants or intervals of 
time. It is the first type with which we are now concerned. 

(1) Choice and Adjustment of Scales 

A system of rectangular co-ordinates, as shown in Figure 44, 
is used to illustrate time series. The time units are placed 
on the abscissa or X axis, and the amounts or frequencies on 
the ordinate or Y axis. Since time has no beginning, a hori- 
zontal zero is unnecessary; the first units may, as convenience 
demands, be indicated near or removed from the point of ori- 
gin at the intersection of the two axes. The ways in which 
the time units are shown, however, differ according to the 
nature of the measurements. If they are taken at successive 
instants, as would be the case, for example, in the measure- 
ments of temperature, the unit on the horizontal axis is indi- 
cated as a paint. If the measurements are in the nature of 
totals which accumulate during a period, as would be the case, 
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for instance, in sales by years, then the unit on the abscissa 
is indicated as a space. ,In both cases, however, the abscissa 
axis should be divided into. equal parts, each one representing 
instants equally removed or periojfis of equal length. 

a. Natural Scale or ^^Difference” Charts 

The ordinate scale, when amounts or frequencies are shown, 
should begin with zero, since they are always reckoned from 
it as a starting point. If this rule cannot be followed, atten- 
tion to its violation should be indicated in some unmistakable 
way. This can be done by using a star (*) and a footnote 
calling attention to the fact, or better by drawing a wavy 

( ) line across the ordinate axis and parallel to the X axis. 

Equal space units on the ordinate scale should represent 
equal amoimts. But ^^equal amounts^^ may have reference to 
quantities or to ratios, and these are not the same. // a scale 
of ratios is used, a zero line is unnecessary — ^in fact, there is 
no zero in such cases.^ 

In deciding upon the proportions between the respective 
scales, the aim should be (1) to allow ample room for the 
illustration itself and for the data which it shows to be in- 
cluded on the graph, (2) neither to over-emphasize nor to 
dwarf the extreme fluctuations, (3) to bring out the charac- 
teristics of the changes over the entire period and from time 
to time (instant or interval) . Bowley states the problem and 
the way in which it should be met in the following language: 

'Tt is only the ratio between the horizontal and the vertical scales 
that needs to be considered. The figure must be sufficiently small 
for the whole of it to be visible at once; if the figure is complicated, 
relating to a long series of years and varying numbers, minute ac- 
curacy must be sacrificed to this consideration. Supposing the hori- 
zontal scale decided, the vertical scale must be chosen so that the 
part of the line which shows the greatest rate of increase is well 
inclined to the vertical, which can be managed by making the scale 
sufficiently small; and, on the other hand, all important fluctuations 

^ See the discussion of Ratio Scales and Ratio Charts, infra, pp. 248- 
255. 
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must be clearly visible, for which the scale may need to be increased. 
Any scale which satisfies both of these conditions will fulfill its 
purpose.” ^ 

The two scales selected will, in a given case, depend, among 
other things, upon the size of the page, the ability of the eye 
to view the illustration as a whole, and the subsequent uses to 
which it is to be put. In the latter respect, a graph used as 
a working paper will differ from one prepared for publication. 
The above discussion of the proportions between the respec- 
tive scales has reference to illustrations involving but a single 
series. When two or more curves are to be placed in the same 
illustration, the case is complicated in the following, among 
other, ways: (1) the amplitude of the fluctuations may be 
noticeably different, (2) they may refer to different periods of 
time, (3) they may be measured in units of widely different 
size, or in entirely different units. Any or all of these con- 
ditions necessitate compromises of one sort or another to be 
made. 

If the amplitudes of the fluctuations are widely different, 
and one chart is used, two ordinate scales may be required if 
actual amounts are plotted — ^that is, if equal spaces show equal 
amounts rather than equal ratios. The same may be true if 
the amounts differ greatly in size. In this case, a single scale 
may be used if it is broken or made discontinuous, one portion 
fitting the smaller, and one the larger amounts. The place at 
which the scale is broken should be indicated by a wavy line 
being drawn across the entire chart. To do this has the 
advantage of bringing the two parts of the charts closely to- 
gether, but the disadvantage of leaving the upper part without 
an evident zero base. This is to be avoided whenever possible. 
In such cases, it is preferable to use separate scales, both be- 
ginning at zero. 

When two or more series of data are placed on a single 
chart, it is sometimes necessary, when difference rather than 

* Bowley, A. L., Elements of Statistics, King, London, 1911, p. 149. 
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ratio changes are shown, to convert one scale into terms of 
the others. Some of the ways in which this can be done are 
as follows: 

(1) By choosing separate scales and making each propor- 
tional to the respective averages of the series Such an 
adjustment is made in Figure 50. Each of the curves must 
then be read in terms of its own scale — the amounts being in 
fact deviations, plus and minus, from their respective averages. 


FIGURE 50 

Capital and Clearings of New York Clearing House Banks 

1902-1915 


(Method of Scale Conversion) 

Mlllionii Billiono 



(2) By expressing the items in the series as percentages of 
their respective totals, and plotting the deviations. When two 
or more series treated in this form are plotted on a single 
chart (a) relative rather than absolute quantities are shown, 
and (6) the respective curves may be far removed from each 
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other, neither beginning nor ending at the same position on 
the ordinate axis. 

(3) By expressing the items of the respective series as 
percentages of the first or last amount. This method of treat- 
ing different series, as shown in Figure 51, has the effect (a) 
of beginning or ending, as the case may be, the different curves 
at the same positions on the ordinate axis, and (b) of mak- 
ing the nature and the amount of deviation in the different 
series, as well as in the same series, directly comparable with 
each other, since the base amounts are treated as equal — 
100 per cent — in computing the percentages. It has the disad- 
vantage that (a) relative rather than absolute amounts are 
plotted, (6) the curves may lie too close together, and (c) the 
first or the last item may not be suitable as a base. 


FIGURE 51 

Capital and Clearings of New York Clearing House Banks, 

1902-1915 
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Adjustments such as those described above are necessary 
only when the ordinate scale shows actual amounts and dif- 
ferences. When ratio changes are illustrated, they are unnec- 
essary, because at any position on the ordinate axis equal 
ratios are indicated by equal spaces. A hundred per cent in- 
crease, whether representing the change from 2 to 4, 4000 to 
8000, or 250,000 to 500,000, etc., always takes the same 
vertical space. 

Charts designed to show ratio changes are discussed in the 
section immediately following. 

If the time intervals are different in two series, and both 
are to be placed upon the same chart, an adjustment of thfe 
abscissa scale is necessary. In all cases, however, equal units 
on this axis should represent equal periods of time or instants 
equally distant apart. If, for example, one senes is given 
by months, and another one by years, the same space cannot 
be allotted to both periods. If this were done, the time changes 
in each series but not those in dijf event series would be com- 
parable. 


b. Ratio Scales and ^^Ratio^^ Charts 

Ordinate axes may show either actual or ratio changes in 
time series. If the former, equal spaces will indicate equal 
differences, positive or negative; if the latter, they will show 
equal rates of change. But spaces on an ascending scale in- 
dicating a given rate of increase do not show on a descending 
scale the same rate of decrease. That this is so may be seen 
from a simple example. A change from 100 to 200 represents 
an increase of 100 per cent, but a change from 200 to 100 is 
a decrease of 50 per cent. The reason for the difference is 
that, in the first case, the base is 100; in the latter, 200. That 
is, different bases are used in computing increases and de- 
creases. 

Comparable ^^difference” and ^Tatio’^ scales — arithmetic and 
geometric progressions — are shown in Figure 52. 
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FIGURE 52 

A Natural or Difference Scale Contrasted with a Percentage 
OR Ratio Scale 



Rates of changes may be shown graphically in either of 
two ways: (1) by plotting the logarithms of the amounts on 
a difference scale, or (2) by plotting the amounts themselves 
on a logarithmic or ratio background. The latter alternative 
is simpler and preferable because (1) the meaning of loga- 
rithms of numbers is not generally understood, and (2) spe- 
cially prepared paper is available upon which ratio changes 
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may be plotted, while log tables are not always accessible. 
Ratio paper is prepared in a variety of forms of which the 
following is an illustration. 

FIGURE 53 

Illustration op How Different Scales May Be Placed on a 
Ratio Background 
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Alternative methods of showing the same facts (1) on a 
^^difference/^ and (2) on a ^^ratio” ^ basis, are given in 
Figures 54-55.^ 

FIGURE 54 FIGURE 55 

Difference and Ratio Charts Showing the Changes in Funds 

^^A^' and 




Figure 54 shows two series — xx and B — plotted on ordinary 
arithmetic rulings — equal spaces representing equal amounts. 
From the figure it appears that the rate of increase in series 
is more rapid than in serie ‘ A.’^ This, however, is not 
the case as is shown in Figure in which the series are 
drawn on a ratio background. Twenty per cent each year is 
added to the items in both series. The uniform rate of in- 
crease is properly brought out in the ratio chart, Figure 55. 

^ Ratio paper m different sizes may be secured, among others, from The 
Education Exhibition Company, New York; Keuffel and Esser, Chicago 
and New York ; Standard Graph Company, New York ; Codex Book 
Company, New York. 

®The figures are reproduced with permission from Bivins, P. A., ‘^The 
Ratio Chart and Its Applications,” The Engmeering Magazine, New 
York, July 1, 1921, p. 2 (Reprint) 
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Figures 56 and 57/ respectively, show the volume of sales 
of three products plotted on a difference and a ratio basis. On 
account of the limits of the scale, product “c” is plotted twice 
— ^the lower part of Figure 56 having a larger scale than the 
upper part. In Figure 57, the rates of movement of the re- 
spective products can be easily compared, two “cycles” of 
ratio ruling being used to show the movements. This chart 
illustrates the advantage of the ratio basis of showing amounts 
widely different in size. No complicated method of scale con- 
version is necessary, as is so often the case under such cir- 
cumstances when a natural or difference scale is used. 

FIGURE 56 FIGURE 57 

Difference and Ratio Charts Showing the Changes in Volume 
OF Sales of Three Products 




The advantages of the “ratio” chart have been summarized 
by various writers, ^ but no more tersely than by Professor 
Irving Fisher. He says: 

^md., p. 3. 

•See Field, James A., “Some Advantages of the Logarithmic Scale in 
Statistical Diagrams,” Journal of Political Eoonomy, October, 1917, pp. 
806-841. This article is reprinted in the author’s Readings and Proh- 
lema in Statistical Methods, Macmillan & Company. New York, 1920, 
pp. 282-305. 
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“The eye reads a ratio chart more rapidly than a difference chart 
or a table of figures. We may recapitulate what most easily catches 
the eye as follows: 

“1. If we see a curve ascending, and nearly straight, we know that 
the statistical magnitude it represents is increasing at a nearly uni- 
form rate. 

“2. If the curve is descending, and nearly straight, the statistical 
magmtude is decreasing at a nearly uniform rate. 

“3. If the curve bends upward, the rate of growth is increasing 

“4. If downward, decreasing. 

“5. If the direction of the curve in one portion is the same as in 
some other portion it indicates the same percentage rate of change 
in both. 

* “6. If the curve is steeper in one portion than in another portion, 
it indicates a more rapid rate of change in the former than in the 
latter. 

“7. If two curves on the same ratio chart run parallel they repre- 
sent equal percentage rates of change. 

“8. If one is steeper than another the first is changing at a faster 
percentage rate than the second. 

“9. The imaginary straight line most nearly representing, to the 
eye, the general trend of the curve, is its 'growth axis,^ and repre- 
sents the average rate of increase (or decrease) ; and the deviations 
of the curve from this growth axis are plainly evident without 
recharting. 


FIGURE 58 

Domestic Orders for Freight Cars and Locomotives, Plotted on 
A Ratio Chart 
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FIGURE 59 

Rate or Turnover op Bank Deposits, Plotted on a Ratio Chart 

AtlNUAL 



FIGURE 60 

Exports and Domestic Consumption of Cotton, Plotted on a 

Ratio Chart 

HILUONS 
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“10. The slope of the imaginary line between any two points on 
a curve indicates the average rate of change between the two.^^ * 

Figures 58, 59, and 60 are inserted to illustrate the different 
uses of ratio charts. 

{2) Types of Lines Connecting Successive Ordinates 

Amounts or frequencies in historical series are generally 
cumulated through a period of time. This is the case, for 
instance, respecting exports, bank clearings’, and industrial 
failures, reported by days, months, years. When they are 
plotted, the ordinates show what has been accomplished dur- 
ing, and not their characteristics in, such periods, deviations 
from which may be positive or negative. But the time inter- 
vals are arbitrary since time is continuous. The cumulations 
are functions of the periods selected in which to express the 
facts. Accordingly, while the amounts on the abscissa scale 
should be indicated as applying to the close of the periods in 
question, the respective ordinates should be connected by con- 
tinuous smoothed lines. Such lines give a picture of the prob- 
able cumulations thought of as occurring through continuous 
time. Of course, if the periods are looked upon as discrete — 
which they are not — then a smoothed and continuous curve 
does not truly represent the facts. From this point of view, 
cumulation is begun anew at the beginning of each period 
and is completed at its end. But periods have neither be- 
ginnings nor endings except as arbitrarily conceived. To look 
upon them as discrete is absurd. Each ordinate is simply a 
conventional stopping place — it may be made earlier or later. 
If it is altered, then the cumulations are changed. Graphi- 
cally, a continuous smoothed line shows the probable changes 
at all possible intervals comprehended in the entire period to 
which the data refer. 

* Fisher, Irving, “The ‘Ratio* Chart for Plotting Statistics,’* Quarterly 
Puhhcations of the American Statistical Association, June, 1917, pp. 
597-598. 



256 STATISTICS AND STATISTICAL METHODS 


Of course, too much liberty may be taken in drawing a 
smoothed line. The heights of the ordinates should be closely 
followed if the smoothed curve is taken to represent the prob- 
able cumulations of the case in question. If it is intended 
to represent an ideal cumulation, then the case is somewhat 
different. In the latter case, the question immediately arises: 
What is the ^^ideaF^ which is to be shown? Until it can be 
answered, smoothing should not be too ^Tree hand.'^ 

On the other hand, certain historical series represent, not 
accumulations at the close of arbitrary periods, but charac- 
teristic facts, deviations being positive or negative, and coin- 
cident with the passage of time. Of such a nature are those 
relating to changes in temperature, barometric pressure, 
ratios of expenses to sales and of assets to liabilities, turn- 
overs of balik deposits, etc. For such series, ordinates should 
be erected at the middle points of the time-units and be con- 
nected by smoothed lines. In reality, they are composed of a 
succession of continuous frequency series, because not only 
time but also the measurements’ are continuous. The units 
on both axes are arbitrary and artificial. Under such circum- 
stances, smoothed curves give more than a direction of trend: 
they idealize both the units and the measurements. 

When related series are plotted on the same chart, they 
should be designated by similar but distinguishable lines. On 
the other hand, lines which lie closely together or frequently 
cross each other should be drawn so as not to be confused. 
Since the use of lines of different color is generally prohibitive 
in cost, it is necessary to choose distinctive types of the same 
color where many curves are drawn upon one sheet. Lines 
should be broad enough to be readily followed, but not so 
broad as to sacrifice the accuracy of the ordinate unit. 

(S) Purposes cmd Methods of Smoothing Historical 
or Time Series 

The methods used to smooth historical or time series depend 
upon the purposes to be accomplished thereby. The two 
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major purposes are (1) to secure a general notion of direction 
or trend, and (2) to analyze trends into their component 
parts as preliminaries to comparisons. Changes in time may 
be of four different types: (1) long-time or secular, (2) sea- 
sonal, (3) cyclical or periodic, and (4) “residuaF’ — a term 
meant to cover all ^^other” types. Different methods of treat- 
ing time series so as to isolate the first three classes of move- 
ments are discussed later in Chapter XIV.^ The discussion 
at this point has to do with the first purpose. 

If nothing more than a knowledge of general direction is 
desired, the free-hand method may suflSce. If it is inadequate, 
the method of “moving averages^^ or “progressive means^^ may 
be used in series which are cyclical or periodic. This method 
involves (1) fixing approximately the length of the cycle, 
(2) totaling the frequencies or amounts for the first complete 
cycle, and taking the arithmetic average, (3) dropping off the 
first and adding a new item, totaling the amounts, and taking 
the arithmetic average, (4) continuing this’ process until the 
entire series is exhausted, (5) plotting the different averages 
at the middle points of each of the cycles, if they contain an 
even number, or half-way between the middle points if they 
contain an odd number of items. 

This process, however, leaves the beginning and the end 
of the series unsmoothed. If the direction of the smoothed 
curve is fairly definite, however, the remaining parts of the 
series may be covered (1) by projecting the curve at both 
ends in keeping with its general inclination, or (2) by assum- 
ing that data similar to those at the respective ends of the 
series are repeated and by continuing to use moving averages. 

This method, however, can be used with precision only when 
series are regularly cyclical or periodic. But how is this fact 
to be determined? Inspection often suffices to suggest a cycle 
but it does not 'define its exact length or its true periodicity. 
To secure a general direction of trend, however, it is not nec- 


^Infra^ pp. 441-457. 
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essary to have precise knowledge in either respect. If the ap- 
proximate length of the cycle is used, moving averages will 
give, for general purposes, sufficiently accurate results. The 
more nearly it can be approximated, however, the better will 
be the results obtained. 

If a period which corresponds to a half cycle, for instance, 
is used, the resulting curve, while it will smooth out the minor 
fluctuations of the incomplete periods, will not materially af- 
fect the longer changes. If a period somewhat shorter or 
longer is taken, the smoothed curve will partake of both the 
short- and long-time changes. In cases where periods are so 
dissimilar that a distorted curve is secured by using an aver- 
age period, it is best not to employ the moving average 
method. 

If historical series are to be correlated or minutely com- 
pared, then neither the free-hand nor the moving average 
method can be used. The trends must then be isolated. Dif- 
ferent ways of doing this are discussed later.^ 

2. PLOTTING CUMULATIVE HISTOKICAL OR TIME SERIES 

Historical or time series relating to amounts or frequencies 
during a period of time may be cumulated. If, on the other 
hand, they have reference to characteristics of conditions at 
instants of time, they cannot be cumulated. Illustration will 
make the difference clear. If sales were available by months, 
the amounts at the successive intervals could be totaled so as 
to show the accumulation during any period of time. Sales 
in February could be added to those of January, and those of 
March to the combined total, etc., in the same way that suc- 
cessive frequencies are added in frequency series. On the 
other hand, temperature measurements at successive hourly 
intervals, ratios at different periods, etc., cannot be treated 
in this manner. To add or cumulate them is meaningless. 


^See Chapter XIV, pwim. 
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Successive ordinates in cumulated time series showing 
amounts should be connected by smooth continuous lines. 
Whatever the time unit used, it is arbitrary: continuity is 
suggested by an unbroken smooth line. 

Ratio changes cannot be cumulated. To add and subtract 
successive ratios has no meaning. Moreover, a ratio chart is 
not suited to show cumulatively what has transpired. The 
scale showing increase has to be differently interpreted from 
that showing decrease. 


V. Conclusion 

The discussion in this chapter has emphasized graphic as 
contrasted with diagrammatic presentation, attention being 
given primarily to (1) the distinction between discrete and 
continuous series and the manner in which they can be truly 
illustrated; (2) the processes of smoothing frequency series, 
and the meaning to be given to smoothed lines, (3) the 
methods of cumulating series and their graphic representa- 
tion, (4) the use of difference and ratio scales in the graphic 
representation of time series, (5) scale conversion and rough 
methods of smoothing historical series, and (6) illustrations 
of types of graphic charts in current use. 

Clear thinking about graphic representation and consistent 
use of devices for this purpose require that distinction be made 
between diagrams — ^pictorial illustrations — and lines and 
points fixed by a system of co-ordinates 

References 

American Telephone & Telegraph Company, New York, 

'^Graphical Method of Smoothing a Series of Frequency Curves,^’ 
Statistical Bulletin, No. 2, April, 1921. 

'Tntroduction to Graphic Methods, Part I,” Statistical Bulletin, 
No 3, November, 1921; Part II, No. 5, June, 1922 
Bivins, Percy A., 'The Ratio Chart and Its Applications,’^ The Engi- 
neering Magazine, The Engineering Magazme Company, New 
York, July, August, September, October, 1921. 



260 STATISTICS AND STATISTICAL METHODS 

Bowley, a L., Elements of Statistics, King, London, 1911, Chapter 
VII, Sections I, II, III, IV, pp. 143-188. 

Brinton, W. C., “Graphic Methods for Presenting Facts, Engi- 
neering Magazine, New York, 1914, Chapters IX, X, pp. 149- 
163, 164-199, respectively. 

Eldbrton, W. P., and Ethel M., Primer of Statistics, Black, London, 
1910, Chapter III, pp 23-39. 

Fisher, Irving, “The 'Ratio' Chart for Plotting Statistics," in 
Quarterly Publications of the American Statistical Association, 
June, 1917, pp. 577-601. 

Haskell, Allan C , Graphic Charts in Business, Codex Book Com- 
pany, New York, 1922, passim. 

Karsten, K G., Charts and Graphs, Prentice-Hall, New York, 1923, 
passim. 

Kelley, Truman L., Statistical Method, Macmillan & Company, 
New York, 1923, Chapter H, pp 9-43. 

King, W. L, Elements of Statistical Method, Macmillan & Com- 
pany, New York, 1912, Chapter XI, pp 97-120. 

Marshall, Alfred, “On the Graphic Method of Statistics" in the 
Jubilee Volume, Journal of the Royal Statistical Society, 1885 

Marshall, W. C , Graphical Methods, McGraw-Hill Book Company, 
New York, 1921, passim. 

Mills, Frederick C , Statistical Methods Applied to Economics and 
Business, Holt, New York, 1924, Chapter II, pp. 11-60. 

Thorndike, E. L , An Introduction to the Theory of Mental and 
Social Measurements, Columbia University, New York, 1916, 
Chapter III, pp. 28-41. 

Yule, G. U, An Introduction to the Theory of Statistics, Griffin, 
London, 1911, Chapter VI, pp 75-105. 



CHAPTER IX 


AVERAGES AS TYPES 
I. Inteodtjction 

The discussion in the previous chapters, like Gaul, may be 
divided into three parts. Chapter I defines the subject matter 
of the book; Chapters II to V, inclusive, describe the manner 
in which statistics are assembled and collected from secondary 
and primary sources, respectively; while Chapters VI to VIII, 
inclusive, discuss the ways in which data are arranged in 
tables, and illustrated by diagrams and graphs. The discus- 
sion has to do with the processes of securing and arranging 
series of statistical aggregates rather than with the constants 
which may be used to describe them; it relates more to the 
manner in which they are built up than to the relations 
between the different parts; more to them in the gross, than 
in the net; more to them as details, than as summaries. 

Statistical aggregates make up series of one type or another, 
descriptive of complex phenomena in point of time, space, or 
condition. The phenomena with which they deal are ^'affected 
to a marked extent by a multiplicity of causes”; they do not 
stand alone.^ If they are to be adequately described by statis- 
tics, then the processes to which so much attention has been 
given in the foregoing chapters must be carried out with scru- 
pulous care. 

Statistical series, however, can rarely be adequately dealt 

* ‘‘Life and the social process are not made up of bracketed situations 
of cause and effect, means and ends, stimulus and response. On the 
contrary, life is composed of related and interrelated situations * * * 
Life IS flow, process. The real search is not for action and reaction, 
but interaction.” Lindeman, Eduard C., Social Discovery^ Republic 
Publishing Company, New York, 1924, p. 44. 
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with without using some kinds of summaries. Comparisons 
make them imperative. Expressions which are descriptive 
of the characteristics of data are required. Averages of va»y 
rious types serve this purpose or function.^ The mind craves 
some sort of an average when dealing with series of statistical 
facts. Interest may be in the average price, the average 
student, average sale, average “business conditions” or what 
not, when dealing with phenomena of these types. Relations 
must be established, and for this purpose the details of series 
are too involved. They must be reduced to a single expression 
which stands for or reduces them to a unit basis.^ 

Averages are used loosely in everyday life. They often 
serve as a cloak for ignorance — ^people being willing to sum- 
marize their opinions in this way when they have no informa- 
tion concerning either the function of an average or the detail 
which it summarizes. They are used to give general impres- 
sions expressive of one's prejudices, general notions, sym- 
pathies or feelings of what ought to be the case in particular 
situations. “Short cuts” of this type are used in making broad 
generalizations about affairs for which often no average is 
available, and which cannot be summarized in this manner. 
Averages are the chief stock in trade of those who are loose 
minded, and prone to generalize. The expression, “on the 
average,” is greatly overworked — so much so that it is hack- 
neyed. Its free use suggests, if it does not always indicate, 

’ ^‘An averafi’e * * ♦ in general we may regard as one of a class of 
statistical constants * * * which concisely label a set of observations 
or measurements pertaining to a common family. It is designed to 
describe the family type more nearly than is possible by observing any 
chance member, and in value it should therefore come somewhere near 
the middle of the family group, so that if the individual members of the 
family chance to be equal each to each in respect to the organ or charac- 
ter observed it should have the same value as they have.” Jones, D. 
Caradog, A First Course in Statistics^ Bell, London, 1921, p. 23. ‘ 

®In speaking of the arithmetic average, Keynes says, “But the utility 
of an average generally consists in our supposed right to substitute, in 
certain cases, this single measure for the varying measures of which it 
is a function.” Keynes, J. M., A TreaUse on Prohability, Macmillan^ & 
Company, Ltd., London, 1921, p. 205. 
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the unscientific minfl. To be scientific is to be able to identify 
similarities and differences and to be precise in one^s general- 
izations about them. The willingness always to use averages 
is not in keeping with this requirement. 

Rarely, if ever, does an average^ contain as much signifi- 
cance as do the detailed data which it summarizes.^ It is 
used as a substitute for that which it replaces, but in this 
fact lies its chief limitation. The same average amount may 
be computed from different details, yet it may be these which 
are of chief interest. If averages alone are used, then the 
details, except in so far as they are reflected in such summaries, 
are ignored. As the formulation of a physical or a natural 
law depends upon observation and experiment, so the use of 
an average grows out of analysis of statistical detail. It pre- 
supposes (1) -a purpose, (2) a knowledge of the peculiarities 
of the data to be averaged, (3) a clear conception of the 
properties of the appropriate average, and (4) a mastery of 
the whole subject to which the data relate so as to be sure 
that the average selected will have the proper significance. 

/ll. Common Averages Defined 

The averages with which we are concerned are those in com- 
mon use. They are as follows: (1) the arithmetic mean or 
average, (2) the median, (3) the mode, and (4) the geometric 
mean. At this stage of the discussion, definitions of each kind 
will suffice. Their peculiar properties and uses will be dis- 
cussed later. 

\^The arithmetic mean or average is the amount secured by 
dividing the sum of the values of the items in a series by their 
number, 

^ "Watkins speaks of averages as “representative numbers” and as con- 
taining “the gist, if not the substance, of statistics.” Watkins, G. P., 
“Theory of Statistical Tabulation,” Quarterly Publications of the Amer- 
wan Btatistieal Association, December, 1915, p, 752. 

*Venn, Dr. John, “On the Nature and Use of Averages,” Journal of 
the Royal Statistical Society (London), Vol. LIV, 1891, pp. 429-448, at 
p. 433. 
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, ’ The median of a series is the value of that item — actual or 

y 

estimated — ^when a series is arranged in order of magnitude, 
which divides the distribution into two equal parts. 

The mode of the items in a series is the value of the one or 
ones which are most characteristic or common. It is the 
typical fact and always relates to a condition which is actually 
j^presented. 

The geometric mean of the items in a series is the result 
secured by multiplying together the values of the various items 
and taking the nth root of their product. 

These are all averages of the ^^first” order — that is, they 
have to do with the actual items in statistical series. In con- 
trast to them, we shall later ^ consider averages of the ^^second^' 
order— those which summarize not the actual items but the 
differences between^ them and some standard amount. 

III. The Arithmetic Mean or Average 

1. WHAT IT IS 

The arithmetic mean is the most familiar average in current 
use. Indeed, it is the only one customarily employed by the 
^^man in the street.^' To him, an average is the average — ^the 
arithmetic mean about which he learned in his school days and 
about which, in its technical aspects, he has given little or no 
"thought. Its use is a matter of daily routine in business. Why 
discuss it in a book on statistical methods! Common use 
and the assurance that it is fully understood do not, however, 
make a discussion of it unnecessary. It may appear that W' 
is understood as to method of calculation, but not as to use 
and relation to other averages — ^matters about which little or 
nothing is commonly known. 

A<icdrding to definition, the arithmetic mean is the result ^ 
secured by adding together the values of the items in a series 
and by dividing the total by the number of items. Thus, the . 

^ Chapter X, passim. 
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arithmetic mean of 5 and 3 is secured by adding one 5 to one 
3 and dividing by 2. The result, 4, is the average. The 
differences of the items from the average — ^plus and minus — 
are numerically equal, their algebraic sum being zero. In the 
illustration, 5 exceeds’ 4 by the same amount as 3 falls short 
of it. Accordingly, such a statistical constant is the center 
of gravity or point oT balance of the items in a series. More- 
over, it should be noted that in adding the quantities the 
influence of each upon the ^Otal is proportional to* its size. 
On the other handy in dividing the total by the number of its 
constituent parts, the items are treated as equal. Accord- 
ingly, the arithmetic mean is much influenced by the relative 
4he Jtems. 

Moreover, the same average amount may be secured from a 
variety of series. To illustrate: The arithmetic mean of 
8, 9, 10, 11, 12, 13, and 14 is 11. So also is 11 the arithmetic 
mean of 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 
13, 13, 14, 14, 14; of 2 and 20; of 9, 9, 4, 22; of 3, 1, 1, 1, 
1, 99, 1, 1, 1, 1, 11; and of many other combinations of items 
which might be selected. When an average is thus wholly 
independent of (1) the order of the items, (2) the number of 
items, and (3j their relative size, it has serious limitations | 
for uses in which the nature of the distribution which is 
averaged is of interest. Moreover, this average may never be ; 
represented in a series. This is the case, for example, when 
2 arid 20; or 9, 9, 4, 22 are averaged. The result is always^ 
the center of gravity, but such a center may not represent an 
actual case. It is fictitious in this sense, although real in the 
/S^nse that the product secured by multiplying it by the num- 
ber of items gives the sum of the parts. Indeed, for the 
calculation of this average it is not necessary to know the 
size of the items' provided the number and their total are 
given. 

If an average is to be taken as a substitute f|^ detail, 
then the arithmetic mean, in spite of its simplicity ^^d ease 
of calculation, has little to recommend it when se™^ are 
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non-homogeneous. It is true that the average can be sub- 
stituted for each item in a series, and the same total be 
secured, but substitution of this nature may not be wanted, 
the characteristic amounts being of interest. An arithmetic 
mean wage-rate, for instance, may tell the management of 
a plant the number of equal parts into which his wage bill 
is divided, but it does not show what the different employes 
actually receive. An arithmetic mean does not necessarily '; 
indicate the nature of the parts of which it is the center ; 
of gravity. 

In the more precise measurements of the physical sciences 
its use is’ well established. “If we have n observed values of an 
unknown, all equally good so far as we know, the most plaus- 
ible value of the unknown (best value on the whole) is the 
arithmetic mean of the observed values.” ^ Speaking further, 
the same writers say, “When the number of observed values 
IS very great, the arithmetic mean is the true value.”^ This 
claim is based upon the principle that, in the absence of bias, 
large errors or deviations are less frequently encountered 
than are those which are small, the errors tending to be 'dis- 
tributed about a true value according to the laws of probabil- 
ity or chance. That is, positives and negative deviations of 
the same size tend to occur with the same frequency.® 

The fact that errors in measurements relating to economic , 
and social phenomena are not subject solely to chance makes 
it impossible in such cases to use with assurance the arith- 
metic mean as the “true” average. Observations are not 
necessarily all “equally good.” They are affected by the 
peculiarities of the units, personal bias, changing purposes, 
and varying motives. The ways in which these affect meas- 

* Wright, T. W., and Hayford, J. F., The Adjustment of Observations, 
D. Van Nostrand, New York, 1906, p. 10. 

11 . , . 

® Certain mathematical properties of the arithmetic mean are discussed 
by Yule, G , in Ayi Introduction to the Theory of Statistics, GriflSn, 
London, 1911. pp. and in Wright and Hayford, op. Ht., Chapter I. 
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urements of economic and social phenomena have already 
been discussed in earlier chapters. 

2. HOW THE ARITHMETIC MEAN IS COMPUTED 

The fact that the arithmetic mean of a series is its center 
of gravity is illustrated in Figure 61. The series of which the 
mean is to be calculated is given in Table 31. 


TABLE 31 

Table Showing Wage-Rates as Bases for the Computation of 
A Simple Arithmetic Mean Rate 


The Unit or Amount Averaged 

The Number of Times Each Unit is 
Encountered 
(The Weight) 

$39p0 

9 ^ 

2.00 

1 

4.00 

1 

3.00 

1 

6.00 

1 

3,00 

1 

8.00 

1 

5.00 

1 

3.50 

1 

4.50 

1 


The sum of the values of the items, $39, divided by the 
number of items, 9, is $4.33. This is the arithmetic mean. 
If the different items are suspended as weights upon an imagi- 
nary rod, as in Figure 61, part A, the rod will balance at the 
scale unit $4.33. If, to the same units, frequencies (weights)’- 
greater than unity but proportionally the same as in the first 
case are assigned, the rod will balance^ at the same place 
This adjustment is shown in part B of Figure 61. In this 
case, the frequencies (weights) have been multiplied through- 


' See the discussion, infra, pp. 279-281, on the distinction between a 
simple and a weighted series. 
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out by 4: that is, each of them is made four times as heavy. 
If, however, the relations between the frequencies (weights) 
are changed, as they are in part C of the figure, then the 
average will change; that is, the center of gravity will be 
disturbed. 

FIGURE 61 

Diagrams Illustrating the Nature of the Arithmetic Mean 
WHEN Items are Differently Weighted 



If the adjustment is made according to chance, the differ- 
ences between the two results will be small. Frequencies 
(weights) of some sort are always present; the effect which 
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they have on the average is determined by their relative 
size and by their distribution. Taking the same units as 
above, and the chance frequencies (weights) given in Table 
32, the average is reduced by only $.10 — ^that is, it is $4.23 — 
notwithstanding the fact that the difference between the ex- 
treme frequencies (weights) is 7, and that the frequency 
(weight) of one item is times as large as that of another. 

TABLE 32 

Table Showing Wage-Rates with Number op Persons Receiving 
Them as a Basis for Computing an Arithmetic Mean Rate 


5 

The Unit or Amount 
Averaged 

The Number op Times 
Each Unit is 
Encountered 
(The Weights) 

Product op the 
Weight 

Times the Unit 

Total 

37 

S166.50 

2.00 

4 

8.00 

4.00 

3 

12.00 

3 00 

9 

27,00 

6.00 

5 

30.00 

3.00 

2 

6.00 

8.00 

3 

24.00 

5.00 

6 

30.00 

3.50 

3 

10.50 

4.50 

2 

9.00 


By arbitrarily adjusting the frequencies (weights) for each 
of the items, the average may be increased or decreased at 
will between the largest and smallest values. Column 1, 
Table 33, shows frequencies selected in such a manner that 
the values larger than the average (when all values are taken 
once) are given large frequencies (weights) and those smaller 
than the average small frequencies (weights) , the importance 
varying directly with the size of the unit. In column 2, Table 
33, the relative size of the frequencies (weights) is reversed. 
Diagrammatically, the effect of choosing such frequencies 
(weights) is shown in parts D and E, respectively, of Figure 
. 61 , 
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TABLE 33 

Table Showing Wage-Rates with Number of Persons Receiving 
Them as a Basis for Computing Arithmetic Mean Rates 


The Unit or 
Amount Averaged 

Col. 1 

The Number op 
Times Each Unit 
Is Encountered 
(The Weig-hts) 

Products 

OF Units 

AND 

Weights 

Col 2 

The Number op 
Times Each Unit 
I s Encountered 
(The Weights) 

Products 
OP Units 
AND 

Weights 

Total 

39 

$195.50 

39 5 

$14225 

$2.00 

2 

4 00» 

8 

16.00 

4.00 

4 

16 00 

4 

16 00 

3.00 

3 

900 

6 

18.00 r 

6.00 

6 

36 00 

3 

18.00 

3.00 

3 

9.00 

6 

18.00 

8,00 

8 

6400 

1 

8.00 

5 00 

5 

25.00 

3 

15 00 

3.50 

3y2 

12.25 

5 

17.50 

4.50 

4y2 

20.25 

31/2 

15 75 

Average 


5.01 


3.60 


By thus arbitrarily selecting the frequencies (weights) , the 
exact sizes being essentially within the limits of those assigned 
by chance, the resulting average is increased in the first case 
(column 1) over that secured by assigning equal frequencies 
by $.68, and over that gotten by assigning chance frequencies 
(weights) by $.78. In the second cas'e, the average compared 
with that obtained by using equal frequencies (weights) is 
decreased by $.73, and when compared with that secured by 
using chance frequencies (weights) by $.63. The difference 
obtained by arbitrarily selecting the frequencies (weights) is 
$1.41 as compared with $.10 when equal and chance frequen- 
cies (weights) are used. 

The arithmetic mean or average of a series of items is a 
function of the importance assigned to each one. It tends to 
be larger than the average of an equally weighted series when 
large items are heavily weighted, and smaller than it when 
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small items are heavily weighted. When frequencies (weights) 
are chosen at random, the resulting average is usually affected 
very little by their absolute size. 

By taking the wage-rates above and assigning to them pure 
chance frequencies (weights)^ (done by drawing by chance 
from a group of nilmbers marked with figures from 1 to 29, 
inclusive) the averages in four trials were found to be as 
follows: $4.43, $4.26, $4.29, and $4.04. These agree closely 
with the result secured when equal frequencies (weights) were 
used. 

The commonly used method of computing arithmetic means 
is to total the values of the items and divide by the number 
of items. In some cases, however, particularly where there 
are many frequency groups and large items, it is easier to 
proceed in a different manner. In keeping with the principle 
that the sum of the deviations, signs considered, from the 
correct average equals zero, an average may be assumed as a 
starting point, the deviations calculated and corrected for 
error, and the correct result determined. This method of 
calculating an average for an ungrouped series of wage-rates 
is illustrated in Table 34. The trial average, $5, is assumed. 
The sum of the minus deviations = — $10; the sum of the plus 
deviations is $4; the algebraic sum is -“$6. The trial average 
is, therefore, not the correct average. If it were, the algebraic 

^The following are chance frequencies (weights) used in this experi- 
ment * 



(The student is advised to try others.) 
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sum would be zero. Since the net error is — $6, the amount 
must be divided by 9, the number of instances, and the product 
algebraically added to $5. The operation is as follows: 

= _ $.67. $5.00 4- (— $.67) = $4.33, which is the 
y 

correct average. 

TABLE 34 

Table Giving Data for Computing the Arithmetic Mean by the 
“Short-Cut” Method 


Units or Amounts 

Frequencies 


Deviations 


— 

+ 

Net Deviations 

Total 

. 9 

$10.00 

$4.00 

— $6.00 

$2.00 

1 

3.00 



4.00 

1 

1.00 



3.00 

1 

2.00 



6.00 

1 


1.00 


3.00 

1 

2.00 



8.00 

1 


3.00 


5.00 

1 





1 

1.50 



4.50 

1 

.50 




The same method is followed in series in which the frequen- 
cies are greater than unity. The only additional step involved 
is to multiply the deviations by their respective frequencies. 
This is necessary because the deviations appear as many times 
as the items are encountered. 

This would be apparent at once, if, instead of indicating the 
number of times each item appears, the alternative plan were 
followed of repeating the item itself. In Table 35, the process 
of calculating a mean in this manner is carried out in detail. 

The total net deviation from the assumed average, $5, is 
—$93.50. That is, $5 is greater than the true average. Ac- 
cordingly, the total net error must be distributed over the 163 
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TABLE 35 

Table Giving Data for Computing the Arithmetic Mean by the 
“Short-Cut” Method 


Units or 
Amounts 

Fre- 

quencies 

Deviations 

Deviations times 
THE Frequencies 

Total Net 
Deviations 

- 

+ 

--- 

+ 

Total 

163 



$161 50 

$68.00 

— $93.50 

$2.00 

25 



75.00 



4.00 

22 



22.00 



3 00 

17 

2.00 


34.00 



* 6.00 

23 


$1.00 


23.00 


3.00 

1 

2.00 


2.00 



800 

15 


3.00 


45.00 


5.00 

27 






3.50 

12 

1.50 


18.00 



450 

21 

.60 


10.50 




items, and the result be algebraically added to $5. The 
computations involved are as follows: —$93.50 163 = 
—$.57. $5.00 -f- (—$.57) = $4.43, which is the arithmetic 
mean. 

When arithmetic means are to be computed for series which 
are grouped, some assumption must be made as to the size of 
the items in the respective groups. The conventional method 
is to assume that the frequencies in each group are dis- ^ 
tributed uniformly throughout its* range, or, what amounts to 
the same thing, that they are concentrated at the center. 
How correct this is, for discrete and continuous series, has‘ 
already been considered. In the absence of exact values, 
however, since precise amounts must be used, the conventional 
method may be followed. 

The ordinary way of computing the arithmetic mean for a 
grouped series is shown in Table 36, the respective frequencies 
being multiplied by the central values of the groups. 
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TABLE 36 

Table Giving Data for Computing an Arithmetic Mean prom 
Frequency Groups 


i 

Units or Amounts 

Frequencies 

Products op Frequencies 
AND THE Units 
(Middle Terms) 

Total 

434 

$3,923 00 

$5 00 to $5 99 

15 

82.50 

6.00 to 699 

40 

260.00 

7.00 to 7.99 

66 

495 00 

8.00 to 899 

91 

773.50 

9.00 to 9 99 

113 

1,073.50 

10.00 to 1099 

49 

514.50 

11.00 to 11 99 

30 

345.00 

12.00 to 12.99 

27 

337.50 

13.00 to 13 99 

2 

27.00 

14.00 to 1499 

1 

14.50 


$3,923 -f- 434 = $9.04 = arithmetic mean or average. 


TABLE 37 

Table Giving Data for Computing an Arithmetic Mean by the 
“Short-Cut^^ Method for Frequency Groups from an 
Assumed Average 


Units or Amounts 

M 

5 

§ 

6 
ct 

Delations prom 
THE Assumed 
Average, $9 50 

Products op 
Deviations and 
Frequencies 

Net 

Deviations 

■1 




Total 

434 

B 


$403 00 



$500 to $5 99 

15 

$4.00 


60.00 



6 00 to 6.99 

40 

300 


120.00 



7.00 to 7.99 


2.00 


132,00 



8 00 to 8 99 

91 

1.00 


91.00 



900 to 9.99 

113 






10.00 to 1099 

49 


$1 00 


49 00 


11.00 to 11.99 

30 


2.00 


60.00 


1200 to 12.99 

27 


3.00 


81 00 


13.00 to 13.99 

2 


4.00 


8 00 


14.00 to 1499 

1 


5.00 


5.00 
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If the method of computing the deviations from an assumed 
average is used, the steps are the same as those used when 
data are not arranged in groups, except that it is necessary, as 
in the case immediately above, to assume a uniform distribu- 
tion throughout each group. The method is shown in Table 
37, the trial average being $9.50, i.e., the item half-way 
through the group, $9.00 to $9.99. 

— $200 -f- 434 = — $.46. That is, the net average deviation 
does not equal zero, but —$.46. Therefore, in order to deter- 
mine the true average (from which the sum of the deviations 
equals zero) it is necessary to add —$.46 to the assumed aver- 
age, $9.50, thus giving $9.04 as the correct average. 

The plus and minus deviations, calculated in the same man- 
ner but from the actual average, $9.04, are given in Table 38 


TABLE 38 

Table Showing the Effect of Computing the Arithmetic Mean 
FROM THE True Average for Data in Frequenct Groups 


Units or Amounts 

5 

1 

1 

ce 

Deviations from 
THE True 
Average, $9 04 

Products of 
Deviations and 
Frequencies 

Net 

DBVIATIO^s 

— 

+ 

- 

+ 

Total 

434 




$305.12 

— $.36^ 

$5.00 to $5,99 

15 

$3.54 


53.10 



6.00 to 6.99 


2.54 





7.00 to 7.99 

66 

1.54 


101.64 



8 00 to 8.99 

91 

.54 


49.14 



9.00 to 9.99 

113 


$ .46 


51.98 


10.00 to 10.99 

49 


1.46 


71 54 


11.00 to 1199 



2.46 


73.80 


12 00 to 12 99 

27 


3.46 


93.42 


13.00 to 13.99 

2 


4.46 


8.92 


1400 to 14.99 

1 


5.46 


6.46 

1 


* This negligible difference is clue to the fact of taking the average at 
$9.04. The exact average is $9,039 -f . 
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When frequency groups are all of equal size, it is often a 
saving of time to compute the deviations from an assumed 
average in terms of the “steps” which successive groups are 
above or below the group containing the assumed average, and 
later to convert the net “step-deviations” back into real de- 
viations by multiplying by 1, in case the step is unity, 2 in 
case it is two, by in case it is one half, etc. Using the dis- 
tribution in Table 38, but assuming a different average, the 
arithmetic mean is computed by the “step” method in Table 39. 

TABLE 39 

Table Giving Data for Computing the Arithmetic Mean by 
THE “Step-Deviation” Method for Frequenct Groups 
PROM AN Assumed Average vy 


Units oe Amounts 

CO 

o 

m 

to 

“Step-Deviations” 
PROM THE Assumed 
Average, $12 60 

Products of 
“Steps” and 
Frequencies 

Net “Sffcp- 
Deviations” 


+ 

— 

+ 

Total 

434 



1506 

4 

— 1502 

S 5 00 to $ 5.99 

15 

7 


105 



6.00 to 

6 99 

40 

6 


240 



7 00 to 

7.99 

66 

5 


330 



8.00 to 

8,99 

91 

4 


364 



9.00 to 

9.99 

113 

3 


339 



10 00 to 

10.99 

49 

2 


98 



11.00 to 

11.99 

30 

1 


30 



12.00 to 

12.99 

27 






13.00 to 

13.99 

2 


1 


2 


14.00 to 

14.99 

1 


2 


2 



— 1502 434 — — 3.46. — 3.46 X f 1.00 (the size of the 
group) = -$3.46. $12.50 (the assumed average) + (—$3.46) 
= $9.04 = the true average. 

Where groups are not uniform in size, this method cannot 
be employed without considerable difficulty. When they are 
uniform, however, multiplying is simplified by computing the 
deviations in round numbers The deviations, however, arc 
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TABLE 40 

Table Giving Data for Computing the Arithmetic Mean by 
THE ''Step-Devution^' Mhthod from an Assumed Average 
When the Groups Are of Unequal Size * 


Gnoups 

Fre- 

quen- 

cies 

“Step- 

Devia- 

tions” 

Products of 
“Steps” and 
Frequencies 

Net 

“Step-Db- 

VUTIONS” 

Sine 

Width 

Center 


+ 


+ 

Total 



30,454 






Total 



24,885 



13,976 

15,242 

+1266 1 

t Less than 6^ 

2 

5 

99 

4 


396 



6^- 8^ 

2 

7 

661 

3 


1,983 



8^-104 

2 

9 

2,722 

2 


5,444 



10^-12^ 

2 

11 

6,153 

1 


6,153 



( 1 ) 12^-14 

2 

13 

6,007 






14^-16^ 

2 

15 

4,926 


1 


4,926 



2 


2,635 


2 


5,270 


18^-20^ 

2 


1,682 


3 




Total 






2,604 

468 

—2136 § 

20^-254 

5 

22.5 

2,604 

1 


2,604 



(2) 25^-30«‘ 

5 

27.5 

2,004 






30^-35^ 

6 

32 5 

468 


1 1 


468 


Total 



291 






(3) 35^-45^ 

10 

40 

291 





II 

Total 






109 

33 

— 76 1 


15 

52.5 


1 


109 



(4) 60^-75^f 

15 

67 5 

60 






1 75^ and 









over 

15 

82.5 

33 


1 


33 



* Data taken from Report of the Tariff Board on Schedule “K,” Vol. 
IV., Part 5. House Doc. $42, 62d Congress. 2d Session, p. 997. 

t Width of group assumed to be the same as that of the class to which 
it belongs. 

t -j- 1266 — 24,885 = .0509. .0509 X 2^ (the width of the group) = 
$001018. $.13 -f $.001018 = $.1310 (average of the first group). 
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in ^^steps/' and they must be converted into the units of the 
series by multiplying them by the appropriate factor. The 
group in this case is $1.00, hence the factor is $1.00. 

Table 40 illustrates the method to be used when groups are 
of unequal size. In such cases it is generally simpler to pro- 
ceed in the regular manner by multiplying through in the first 
instance. 


3. SOME ^^DO's AND DON^TS’^ IN THE USE OP AVERAGES 

(I) Do Not Average Averages Unless They Are Properly 
Weighted 

Example 

It IS desired to secure the arithmetic average of the following series 
separately and combmed: 

Series li $3, $4, $4, $5j Series 2: $2, $6, $7. 

Computation, Series 1: + + $16-^4 = $4 

Computation, Senes 2: $2 + $6 + $7 = $15. $15 — 3 = $5. 

^ Computation, Combmed Senes, Correct : $3 + 14 + $4 + + $2 
+ $6 +. $7 = $31 . $31 7 = $4.43. 

Computation, Combmed Series, Incorrect: $4 + $5 = $9 $9-^2 

= $4.50. 

(Notes to Table 40, continued) 

§ — 2136 ^ 5076 = — .421. — .421 x (the width of the group) = 
— $.02105. $.275 + ( — $.02105) = $.254 (average of the second group). 
1 1 $.40 is the average of the third group. 

f[ — 76-f-202== — .376. —.376 x 15^ (the width of the fourth 

group) = — $.05640. $.675 + (*- $.05640) = $.6186 (average / of the 

tourth group). 
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Example ^ 

It is desired to compute the average percentage relation of rent 
to sales for the experience shown m the following table: 


Net Sales 
( in OOO’s) 

Total Sales 

Total Rent 

Per Cent 

Rent to Sales 

Under $40 

$8,471,952 

$255,845 

3 02 

$40 to 80 

20,719,729 

545,733 

2 63 

80 to 180 

26,232,605 

729,026 

2.78 

180 and over 

30,555,976 

737,008 

2 41 

Total 

$85,980,262 

$2,267,612 

2 64 


Correct method $2,267,612 — $85,980,262 == 2 64 per cent 

r . .u ^ 3 02 + 2.63 + 2.78 + 2.41 

Incorrect method: = 2 71 per cent. 

4 


(^) Do Not Confme Simple and Weighted 
Arithmetic Averages 

An arithmetic average computed from series in which the 
frequencies are greater than unity is not necessarily weighted. 

a. Computation of Simple Arithmetic Averages for Senes 

(1) in Which the Frequencies Are Unity in Each Case, and 

(2) in Which They Are Greater than Unity 

“A” 


Wage-Rates 

Number 

Product : 
Number Times 
Rate 

$5 

1 

$5 

6 

1 

6 

7 

1 

7 

8 

1 

8 

9 

1 

9 

Total. . . 

5 

$35 


Average = $35 -f- 5 =- $7. 


Wage-Rates 

Number 

Product 
Number Times 
Rate 

$5 

2 

$10 

6 

3 

18 

7 

1 

7 

8 

2 

16 

9 

2 

18 

Total 

10 

$69 


$69 10 = $6 90. 


^See Secrist, Horace, “A Statistical Paradox” in Journal of the Amer- 
ican Statistical Association, June, 1923, pp. 776-780. 
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The two series are in reality the same since Type may 
be written in the form of Type as follows: 


Wage-Rates 

Number 

$5 

1 

5 

1 

6 

1 

6 

1 

6 

1 

7 

1 

8 

1 

8 

1 

9 

1 

9 

1 

Total $69 

10 


$69— 10 = $6.90 


b. Computation of Weighted Arithmetic Averages 
A weighted arithmetic average is one secured by applying 
to the items weights determined by some evidence of impor- 
tance other than that associated with the items themselves.^ 

Example 1 


Per Cent op 

Total Acreage 

Relative Condition 

OP Crop 

Product op Per Cint 
Acreage and 
Relative Condition 

7/10 

good = 2 

14/10 

2/10 

fair ~ 3 


1/10 

poor = 5 

6/10 

Total 


25/10 


Average condition = 25 10 == 2.5. 

^ ‘ “The multiplying of a score by the number of cases having it has at 
times been called weighting, but in this text the term will be used to 
mean the multiplying of scores by amounts determined not at all, or not 
solely, by the population, but from other evidences of importance.” Kelley, 
T. If., Statistical Method^ Macmillan & Company, New York, 192B, p, 68. 
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Types op 
Employes 

Number ou 
Payroll 

Relative 

Productivity 

Productivity Index 
TIMES Number 

Men 

5 

1 

5 

Women 

4 

% 

3 

Youths 

3 

% 

1% 


Total men-equivalents = 9% 


Example S 


Family 

Budget 

» Item 

Relative 
Importance in 
Family Budget 
*‘The Weights” 

Per Cent 

Increase 

July, 1914 to 
November, 1920 

Multiplied 
by Weights 

Food 

43.1% 

93 

4008.3 

Shelter — 

17.7% 

66 

1168.2 

Clothing 

13.2% 

128 

1689.6 

Fuel & lighting 

5.6% 

100 

560.0 

Sundries ' 


92 

1876.8 

Total 

100.0% 


9302.9 


Average = 9302,9 -r- 100 == 93 03 per cent. 


(3) Distinguish Between Including and Not Including 
^^Zero” Cases in an Average ^ 


ZERO CASES 
INCLUDED 


A X AT j i * Amount of duty collected 

Average tariff duty* = = 7 -= — -r 

* Value of imports 

. , , Total wages paid per year 

Average daily wage=-:;-;^ — r — ^ ^ 

Number of days in a year 


Average amount of taxes paid = 


Total taxes 
Number of people 


. X » , Liquor consumed 

Average consumption of liquor ~ Total population 


Average number of accidents per day * = 


- Number of accidents 
Number of days 


ZERO CASES NOT 
INCLUDED 

Amoimt of duty collected 
Value of imports paying 
duty 

Total wages paid 
Number of full days worked 
for which wages were paid 

Total taxes 

Number of tax payers 
Liquor consumed 
Total number of consume 
Number of accidents 
Number of days on which 
accidents occurred 


^ See supra^t pp. 80-81, 89, for a discussion of an analogous problem 
relative to statistical ratios or coefficients. 

*See Secrist, Horace, Readings and Frol)lem8 in Statistical Methods^ 
Macmillan & Company, New York, 1920, pp. 334-341. 
pp. 164-184. 
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4. SUMMARY 

In summarizing the discussion of the arithmetic mean, at- 
tention should be called to the fact that it is (1) easily 
understood, (2) readily calculated, (3) in everyday use, and 
(4) affected by all of the items in a series. Indeed, when 
nothing more is wanted, as a summarizing expression, than 
the total divided by the sum of the parts, it thoroughly meets 
the need. But in statistical analysis of economic problems 
requirements generally run far beyond this. Details, as well 
as averages, or at least averages other than the arithmetic 
mean, are required. It is to a discussion of these to which 
attention is now turned. 

IV. The Meuian 

1. WHAT THE MEDIAN IB 

The median of a series has been defined as the value of 
that item — actual or estimated — ^when a series is arranged in 
order of magnitude which divides the number of frequencies 
into two equal parts. It is in the nature of an average, but 
in fact is a ^^partition expression,^^ being the value of the 
middle item when series are arranged in order of size. It may 
or may not be representative of the different values. As to 
whether it is or is not depends upon the nature of the distribu- 
tion involved. Moreover, unlike the arithmetic mean, the sum 
of the amounts is not secured — except in ^^normal” distributions 
where the arithmetic mean and the median are the same — by 
multiplying the median by the number of items. The amounts 
are not added and averaged; they are arrayed. Again, in cal- 
culating it, each item, whether large or small, is assigned the 
§ame importance, all frequencies being treated alike. The 
exact size of all of the items except the median one may be 
unknown, and yet it can be determined, because the only re- 
quirement for its calculation is that the items be arrayed in 
order of magnitude and the center one chosen. Moreover, like 
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the arithmetic mean, the median may be a value not found 
in a series — ^it may be an estimated rather than an actual 
amount. 

2. HOW THE MEDIAJSr IS DETERMINED 


Since the median is the value of the middle item in a series, 
it is calculated by using the following formulae: if the number 

of measurements, n, is odd, use ; if it is even, the median 


71 f 7% \ 

value lies between ^ and ( 2 ^ however, ^^the 

Value of a measure is the value of its mid-point, this (the value 

^ -L. J 

of the measure at — ^ — ) is equivalent to saying that the me- 


Th 

dian is the limit of the range covered by ^ measures counted 

either down from the top or up hoih the bottom.’^ ^ 

The manner in which the med» is computed in an un- 
grouped series made up of an odd nuiiiber of items is shown 


TABLE 41 

Table Giving Data for Computing the Median 


Unit 

Frequencies 

Total 

9 

$2,00 

1 

3.00 

1 

3,00 

1 

3.50 

1 

iL.00' 

1 

4.50 

1 

5,00 

1 

6.00 

1 

8.00 

1 


^Kelley, T. L., Statistical Method, Macmillan & Company, New York, 
1923, pp. 55-56, 

V* 
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in Table 41. By using the data in Table 31, p. 267, but re- 
arranging the units in an ascending order — an unnecessary 
step in computing the arithmetic mean — ^the series is shown 
in Table 41. 

n 4- 1 9+1 

Applying the formula, — ^ ^ when n = 9, we get — 5, 

i.e., the fifth item divides the series into two equal parts. 
Counting down from the smallest item, or up from the largest 
one — a matter of indifference — ^$4.00 is found to be the median. 
It should be noticed that the total frequencies, rather than the 
range of the size of the items, are divided in half. In the 
illustration, $4.00 is only $2.00 away from the first item, but 
$4.00 away from the last. Moreover, in determining the 
median in this case, $2.00 is of as much importance as is $8.00, 
It is quite different, of course, respecting the arithmetic mean. 
Moreover, while retaining the frequencies as above, every item 
in the series except the middle one may be changed — ^the only 
limitation being that the order must remain ascending — and 
the median remain the same. Various adjustments of this 
type are given in Table 42. 

The median in every case is the fifth item — $4.00. It is not 
affected at all by changing the size of the items above or below 
the fifth one so long as the number of items remains the same 
and the series is ascending. Indeed, it is not affected by the 
addition of other items provided as many less than the median 
as well as more than it are added. On the other hand, the 
arithmetic mean is determined by both the number and size 
of the items. The quantity $10,000 in column 6 has 5000 
times as much influence as has the quantity $2.00 in deter- 
mining the arithmetic mean. But they have equal influence 
in fixing the median since each one is represented once. The 
median, therefore, thought of as an average to be substituted 
for the different items in a series, may be used only when (1) 
the differences between the consecutive items are small, or (2) 
the series is of the normal law of error type, the items at or 
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TABLE 42 

Table Giving Data Showing the Effect of Changes of Distri- 
bution ON THE Median and the Arithmetic Mean 


Frequencies 

Units and Illustrations 

Total 9 

1st 

2d 

3d 

4th 

5th 

6th 

1 

$2.00 

11.00 

$3.99 

$4,00 

$ 25 

$2.00 

1 

3 00 

1.00 

3.99 

400 

50 

3.00 

1 

3.00 

100 

3.99 

400 

.75 

3.00 

1 

3.50 

1.00 

3.99 

4.00 

1.00 

3.50 

1 

4.00 

4 00 

4.00 

4.00 

4.00 

4.00 

1 

4.50 

4 00 

4.01 

4.00 

4.00 

4.50 

1 

5 00 

4 00 

4.01 

4.00 

4.00 

5.00 

1 

6 00 

4.00 

4.01 

4.00 

400 

6 00 

1 

8 00 

4.00 

4.01 

4 00 

400 

10,000.00 

Median 

4.00 

4.00 

400 

4 00 

4.00 

4 00 

Arith Mean 

4 3*3 

2.67 

4.00 

1 400 

2 50 

1,114.45 


near the median being the most common. In the latter case, 
the median is the same as the arithmetic mean, deviations in 
excess and in defect of it tending to be distributed about a 
true value according to the law of chance. Under such condi- 
tions, it is as much the 'True” average, in the mathematical 
sense, as is the arithmetic mean. But the two averages are 
rarely equal for the simple but sufficient reason that normal 
distributions are seldom, if ever, found. 

When the number of items, n, in a series is even, the median 


lies between the and 




items’. If a series is dis- 


crete no actual case appears at such a position. If a median 
amount is selected it is purely arbitrary. If a series is con- 
tinuous, each measure is an approximation to the true measure, 
and, theoretically, items appear between these limits. The 
conventional practice in both cases is to take an amount half- 
way between the middle items. The justification of doing this, 
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however, is different in the two types’ of series. For a series 
which is discrete, the median under such circumstances is 
fictitious; for one which is continuous, it is theoretically 
although not actually present in the series’. 

The calculation of the median of a series containing an even 
number of items may be illustrated by adding one item to each 
of the series in Table 42. For instance, if an item of $200 
is added to the series in Illustration 1, the median be- 
comes $3.75. That is, n is now 10. The two formulse giv- 
ing the position of the median will then read as follows: 



the 5th and the 6th item, that is, between $3.50 and $4.00. 
It is fixed conventionally at $3.75. If $8 00 is added to the 
same series, the median as located by these formulse falls 
between $4.00 and $4.50. It may be arbitrarily given the 
value of $4.25. Moreover, if to the series in Illustration 2, 
$600, $10,000, $12,000, $13,000, and $14,000 are added, the 
median is still $4 00. In this case, however, the size of the 
median is the same as that of the adjacent items because they 
are identical. 

When data are arranged in frequency groups, the problem 
of determining the median is the same as it is when they are 
not grouped, except that it is necessary arbitrarily to distribute 
the frequencies within the groups in order to interpolate for 
the exact median. What is wanted is not only the medmn 
group, but the median item in the group which divides a series 
in half. To express the units in groups rather than individu- 
ally makes it necessary to approximate the value of each of 
them. For discrete series classified in narrow groups, and for 
all continuous series, the assumption of a uniform distribution 
is sufficiently accurate for most purposes. Any error arising 
from this assumption will be negligible.^ 

^This is more particularly true since at the median position the fre- 
quencies are generally numerous. This is always the case in distributions 
of the normal type and in those which approach it. 
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A grouped frequency series is shown in Table 43. In this 
case, n is 434: that is, it is an even number. On the assump- 
tion that the items through the groups are uniformly dispersed, 
and that it is admissible to compute the exact median, the 


process is as follows: 217 > I 2 ^ I ~ median, 

therefore, lies in the group containing the 217^th item. The 
value of this item is the median. 


TABLE 43 

* Table Giving Frequency Data for the Computation of the 

Median 


Units or Amounts 

Frequencies 

Total 

434 

$ 5.00 to $ 5.99 

15 

6.00 to 6.99 

40 

7.00 to 7.99 

66 

8.00 to 8 99 

91 

9.00 to 9 99 

113 

10.00 to 10.99 

49 

11.00 to 11.99 

30 

12.00 to 12.99 

27 

13 00 to 13.99 

2 

14.00 to 14.99 

1 


By counting down from the smallest item, the group $9.00 
to $9.99 is found to contain all the items between 212 and 325. 
The 217^th man’s wage-rate is, therefore, located within this 
group. On the assumption that the 113 men whose wage-rates 
fall within the group $9.00 to $9.99, inclusive, are uniformly 
distributed in the order of the size of their rates, the wage-rate 
which is half-way between that received by the 217th and the 

218th man is|^ X $1-00, or $.05 greater than $9.00, i.e., than 
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the amount received by the first man in this group.^ This 
gives a median wage-rate of $9.05 which corresponds very 
closely to the arithmetic mean, $9.04, as computed for the 
same data — ^Table 36. 

Since this example has to do with wage-rates — a discrete 
series— -the median might with suflScient accuracy be given the 
"'approximate” value of $9.05 since it falls in the lowest 
quarter of the group $9.00 to $9.99. 

^^flow precisely a median should be determined depends 
largely upon the nature of the distribution. The regularity 
of this series justifies greater nicety in its computation than 
is typical of most discrete series. Arbitrarily to give it an 
exact value, however, where it is evident that the differences 
between the units are clearly unequal, is to allow the ideal 
position of the terms in the group to rob it of much of its 
significance. This is true only if the median is considered 
to be more than a mathematical center. It should be inter- 
preted in connection with the kind of series ^ with which it is 

order to liave the 113 men distributed throughout this group uni- 
formly and to have the same apply to the groups immediately following 
and preceding, it would be impossible to assign a man to the last unit of 
a preceding group and to the first unit of the succeeding group. To do 
this would result in a concentration at this point. Zizek, in discussing 
an analogous point, says : “We can distribute 10 values in a class of 200 
cents breadth so that the first and the last values coincide with the limit- 
ing values of the class , so that the first item coincides with the inferior 
limit while the last value is as far distant from the superior limit as are 
the items from each other; or, so that the last item coincides with the 
superior limit while the first item is as far distant from the inferior limit 
as are the items from each other. None of these three distributions seems 
to be free from objection. The first kind of distribution, if carried out 
in the adjoining classes, would give two items at each class limit. The 
second and third kinds of distribution do not correspond at all to the 
postulate of a uniform distribution within the classes. The most correct 
way of distributing the items uniformly is to assume that they occur at 
equal intervals even when this distribution is extended to the adjoining 
classes. To fulfill this condition the first and last of the items belonging 
to the class must be removed from the class limits to a distance which 
corresponds to half the magnitude of the interval existing between the 
items belonging to the class.'' Statistical Averages, pp. 208-209. 

* In the Dewey Meport on Employees and Wages, the median is ex- 
pressed only by ^oup location, and this notwithstanding the fact that 
the groups are small and the series exceptionally regular. 
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used. If, in the nature of the case, it can be located with 
precision, then it should be so located; if otherwise, then it 
should be given an approximate value. 

By extending the principle according to which medians are 
located, series may be divided into any number of parts. The 
values of the items dividing a complete series into four equal 
parts or the halves intO' two equal parts are called qiuirtiles. 
The dividing position for the lower-half is known as the first 
quartile, or Q1 ; and of the upper-half, the third quartile, or Q3. 
Obviously, however, these quarter division marks are not aver- 
^ages in the same sense as are the arithmetic mean and the 
median inasmuch as they have reference to only a part rather 
than to the whole of a series. Indeed, for their location, the 
respective parts become complete series. They are not in the 
same sense typical of, nor may they be considered substitutes 
for, whole series — an implied characteristic or attribute of an 
average per se. 

The first quartile is located with sufficient accuracy by using 
the formula — ^ — , where n is the number of items. The third 

quartile is located by using — ^ ^ ^ ^ • 

But quartiles (quarters), deciles (tenths), percentiles (one 
hundredths), etc., are not of the nature of averages of the 
first order; that is, as amounts which may be considered as 
types or substitutes for detail. Later, in considering the way 
in which items in series are distributed around their averages, 
we shall have something more to say about th em ^ 

The median and its kindred partition expressions — quartiles, 
deciles, etc. — are easily located graphically on cumulative 
curves or ogives by (1) dividing the total measure on the 
ordinate scale into the required number of parts, (2) extend- 
ing a line from the point selected parallel to the base or 
abscissa axis until it meets the ogive, and (3) dropping a 
perpendicular at this point until it crosses the abscissa scale. 

* See infrar Chapter X, DispersiorK 
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What the scale reading means depends upon the nature of the 
series. If data are discrete and the perpendicular falls between 
the measurements, there is no amount which divides the 
series in half. If the series is continuous in fact and such a 
condition occurs, then a median amount may be assigned by 
assuming (1) that another grouping would give such a result, 
or (2) that another selection of measures expressed in the 
same way would produce such an amount. If data are grouped 
and the perpendicular falls within a group, nice interpolation 
is rarely advisable for discrete although it may be made for 
continuous series. 

An illustration showing the manner in which the median 
and quartiles are graphically determined in a cumulated fre- 
quency series is given in Figure 48; the way in which it is 
done in a cumulated time series is shown in Figure 62. In 
the latter case, the data shown in Table 44 are used. 

The first half of the raw cotton imported in the period 
1895 to 1913, inclusive, came in between 1895 and approxi- 
mately September of 1906,^ that is, during eleven years and 
eight months. The second half was imported between Sep- 
tember, 1906, and the close of 1913, or during seven years 
and four months. The median period — ^that is, the half-way 
period in terms of amounts imported — ^was September, 1906. 
In terms of time alone, June, 1904, is the median period. At 
that time, however, only 40.1 per cent of the total had been 
imported. These facts are shown graphically on Figure 62. 
In order to locate the median period in terms of importations, 
the ordinate axis is bisected at 710,000,000 lbs. and a line 
extended until it meets the historigram (historical graph) 
vertically over the period September, 1906. Obviously, in 
order to locate the median period in terms of time alone, the 
abscissa axis is bisected at June, 1904, and a perpendicular 
raised until it meets the historigram horizontally opposite the 
position 570,000,000 on the ordinate scale. 

‘On the assumption of uniform, importation during the year. 
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TABLE 44 

Table Showing by Years Singly and Cumulatively the Quan- 
tity OF Raw Cotton Imported into the United States, 1895 
TO 1913, Inclusive 

{Statistical Abstract of the United States, 1913, p. 669) 


Yeah 

Amount op Raw Cotton Imported, in Pounds 
(OOO’s omitted) 

Non-Cumulative 

Cumulative 

*Up to and 
Including” 

*'After and 
Including” 

Total 

1,421,152 

1,421,152 

1,421,152 

1895 

49,332 

49,332 

1,421,152 

1896 

55,350 

104,682 

1,371,820 

1897 

51,899 

156,581 

1,316,470 

1898 

52,660 

209,241 

1,264,571 

1899 

50,158 

259,399 

1,211,911 

1900 

67,398 

326,797 

1,161,753 

1901 

46,631 

373,428 

1,094,355 

1902 

98,716 

472,144 

1,047,724 

1903 

74,874 

547,018 

949,008 

1904 

48,841 

595,859 

874,134 

1905 

60,509 

656,368 

825,293 

1906 

70,964 

727,332 

764,784 

1907 

104,792 

832,124 

693,820 

1908 

71,073 

903,197 

689,028 

1909 

86,518 

989,715 

517,955 

1910 

86,037 

1,075,752 

431,437 

1911 

113,768 

1,189,520 

345,400 

1912 

109,780 

1,299,300 

231,632 

1913 

121,852 

1,421,152 

121,852 


If it is desired graphically to locate the median amount in 
an historical series, amounts and not periods must be arrayed 
consecutively and each reported performance counted as a 
frequency of one. When this is done, the process is the same 
as in cumulative frequency series; that is, the amounts cu- 
mulated are plotted on the ordinate and the corresponding 
periods on the abscissa axis. 
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FIGURE 62 


CxjMXTiATiVE Graphs — Historigrams — Constructed on '"Up to 
AND Including” and 'After and Including” Bases, Showing, by 
Years, Importations of Raw Cotton into the United States 



Objection may be raised as to the propriety of using the 
median for this purpose, yet there seem to be no reasons why 
it is not as useful and significant to divide in this manner a 
time as an amount or frequency series. Indeed, in the business 
world, the occasion for doing the former will probably occur 
more frequently than the latter. When it is desired, for in- 
stance, to distribute expenses over a period, the proportions 
incurred during one quarter or one half of the time may be 
of real significance. Of course, amounts, likewise, may be 
partitioned into equal parts and compared to the time in 
which incurred. In either case, by plotting the amounts cu- 
mulatively and the periods consecutively, the median positions 
may be located and related to each other. 
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The necessary steps in determining arithmetically the me- 
dian amount imported are given below, and the data arranged 
as in Table 45. Place the amounts in numerical order and 

^ I - 

apply the formula — — , since n is odd. Thus, n = 19. 

- = 10. The 10th or median item is 70,964,000 lbs. That 

is, over a period of 19 years the amount imported which stood 
half-way between the extremes was 70,964,000 and this oc- 
curred in the year 1906. The arithmetic mean amount im- 
^ported is 75,800,000+ lbs. The large items in the latter years 
largely explain the difference. In this arrangement, order of 


TABLE 45 

Table Showing Data op Importations of Raw Cotton Arranged 
so AS TO Determine the Median Amount Imported 


Periods 

Frequencies 

Importations in Pounds 

Total 

19 

1,421,152,000 

1901 

1 

46,631,000 

1904 

1 

48,841,000 

1895 

1 

49,332,000 

1899 

1 

50,158,000 

1897 

1 

51,899,000 

1898 

1 

52,660,000 

1896 

1 

55,350,000 

1905 

1 

60,509,000 

1900 

1 

67,398,000 

1906 

1 

70,964,000 

1908 

1 

71,073,000 

1903 

1 

74,874,000 

1910 

1 

86,037,000 

1909 

1 

86,518,000 

1902 

1 

98,716,000 

1907 

1 

104,792,000 

1912 

1 

109,780,000 

1911 

1 

113,768,000 

1913 

1 

121,852,000 
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magnitude in the amounts rather than continuity of time is 
followed. In the former arrangement, the time units are con- 
secutive.^ 

3. SUMMARY 

The median as an average or summarizing expression should 
be used with great care. While in its computation all fre- 
quencies are required, it is not affected by the size of the items 
except at or near the middle of a series. This may be a 
significant weakness when not only tlie number of times an 
item appears but also its positive size is important. Theoreti- 
cally, it is best suited to continuous series or to discrete series 
in which the measurements are numerous and accurate, and 
when the scale is small and the groups into which they are 
merged narrow. It should be considered only as one sum- 
mary of a distribution, and be compared with the arithmetic 
mean, and the mode whenever possible. 

V. The Mode 
1. WHAT THE mode IS 

The mode strictly defined is the value of that item in a 
series which is most characteristic or common. It is the 
typical measurement — ^the one which is found the greatest 
number of times. But not all series possess a single or even 
a well-defined mode. Some have more than one mode, while 
others can scarcely be said to have a mode at all The modell 
therefore, is frequently indefinite, its boundaries being difficult 
to define, and its position uncertain. ^ 

As a form of average, the mode may be used in time, in 
space, and in condition or frequency series. That which occurs 
most unifonnly during a period of time is modaC * J*- 

stance“, the modal number of calls per day made by a salesman 
upon his clients is (say) five. Day in and day out, this tends 

* Respecting the further use of the median in the treatment of time 
series, see pp. 449-453, infra. 
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to be the most characteristic number. As many calls as ten, 
and as few as two are exceptions — ^they are non-modal. Again, 
the most common daily sail of steamship is 400-450 knots. 
Under extremely favorable conditions, she has done 600 knots ; 
under adverse conditions, as few as 200. The density of 
population varies widely from district to district. That con- 
dition most commonly encountered is modal; the extremes 
are not modal. Operating expenses in relation to sales as high 
as 35 per cent and as low as 12 per cent are occasionally en- 
countered in the retailing of meat. The characteristic or 
modal rate, however, is in the neighborhood of 20 per cent. 
Again, most males marry at the ages’ 25-30, although cases are 
found where marriage is contracted by youths of 18 and by 
men of 65. These ages, however, are non-modal — ^they are 
not ^'the rule.’^ 

The mode as a statistical short-cut or summary has both a 
general and a precise usage. In such expressions as those 
above and in the following it is used to suggest the prevailing 
condition: “The average man is honest.^’ “The average page 
contains 300 words.^^ “The average number of words in a 
line of newspaper type is seven.” “The average man takes 
a ‘40^ coat.” “The average length of a class recitation is 50 
minutes.” 

In the second sense, however, it is used more precisely. It 
refers to a real or to an imaginary measurement found or 
. WlTejre^itsjpjp^tim is indefinite, frequen- 
cies are adjusted by widening the groups into which they 
faJPi, until a modal group is made to appear.^ Then within ^ 
the group, the precise mode is located by interpolation, on the^ / 
assumption that the frequency of the items in the neighbor- 
hood of the mode influences its position in proportion to their 
respective sizes, or that in a wider universe of which the 
series in question is but a sample, there is a modal or most 
frequent measurement. 


" See Tables 18, 46, 48. 
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If all measurements were continuous and followed the nor- 
mal law of error or probability curve, a mode of such precision 
would no doubt obtain, both in the sample and in the entire 
^^population/' But not all series are of this type. Some are 
discrete, measurements falling at more or less arbitrary units 
which do not arrange themselves in keeping with the normal 
curve of error. In such cases, the search for an ideal modal 
position is illusory. The measurement occurring most times is 
modal, the items appearing above and below it having no in- 
fluence on its position. 

The mode in all cases is a reality — a measurement found 
either in a series or expected in keeping with some underlying 
assumption of distribution. But the mode is no less definite — 
although it is frequently less precise — if in continuous series 
it is spoken of as falling within certain limits, rather than 
as being a precise amount.^ Indeed, where nothing is known 
as to the manner in which instances in series are distributed 
throughout a modal group, or about the accuracy of the meas- 
urements themselves, a mode which is spoken of as falling 
within certain limits may be more precise — nearer the truth — 
than one which is given as a specific amount. 

In series which are discrete, the mode generally falls at a 
particular value. Measurements occur at definite intervals. 
There is no basis for searching for an ideal mode upon the 
assumption that the measurements at hand are only approxi- 
mations, or that a true mode would be found if the samples 
were more numerous. Of course a mode may be made to 
appear by a manipulation of the frequencies — successively 
widening the groups into which they fall — but the wider the 
groups the more unreal does the ^^mode” as determined in this 
manner become. Moreover, to interpolate within a group 

a recent study, the writer has defined the area markedly deviations 
w 20 per cent on either side of the average as modal. See Secrist, 
Horace, Expense Levels in Retailing — Study of the ^Representative 
F^m and of ‘Bulk Line’ Costs in the Distribution of Clothing,” BuTeau 
of Busimss Research Northwestern^ Umversity, Series II, No. 9, Chi- 
cago, 1924. 
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in order to secure a precise mode in such cases is never legiti- 
mate because it must be arbitrarily done. It should never be 
made to appear that there is an exact mode when in fact one 
does not exist. 

The meaning of the mode and the manner in which it is 
located can be best discussed in connection with concrete cases 
representing different kinds of series. 

2. HOW THE MODE IS LOCATED 

(1) The Location of the Mode in Historical or Time Series 

" That which is modal or typical occurs most frequently. The 
exceptional is not modal. In Table 44, showing importations 
of raw cotton from 1895-1913, the modal year was not 1913, at 
which time there was imported almost three times as much 
cotton as there was in 1901. This is the exceptional year. 
Years which may be suggested as modal are 1895, 1897, 1898, 
1899, 1901, and 1904, in each of which between 45 and 55 
million pounds were imported. If the conditions set up to 
determine the mode be altered so as to include all years 
in which between 45 and 60 million pounds were imported, 
1896 also must be called a modal year, and 55 + millions a 
modal amount. In this, as in so many cases, the mode is in- 
definite. The way in which historical series may be treated 
in order to determine an approximate mode is illustrated in 
Table 46. 

In this table the amounts are arranged in order of magni- 
tude. The grouping is as follows: column 2, 5 million pounds; 
column 3, 10 million pounds; column 4, 10 million pounds, but 
starting at 45 million and extending to but not including 55 
million; column 5, 15 million pounds; and column 6, 8 million 
pounds. The amounts are equally common in column 1, no 
account being taken of the degrees of absolute difference. In 
column 2 (the grouping being 45 to 50, 50 to 55, etc.) groups 
45 to 50, 50 to 55, and 70 to 75 are equally common. By 
widening them to 10 million pounds, as in column 3, more in- 
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stances now appear at the group 50-60 million than at any 
other place. By retaining the 10 million pound group but be- 
ginning it at 45 million, a decided concentration appears in the 
first group. By extending the width to 15 million, the group 
45 to 60 shows the greatest concentration, but a secondary 
mode appears in the group 60 to 75 million. Where is the 


TABLE 46 

Data Showing Importations of Raw Cotton into the United 
States, Arranged so as to Determine the Modal Amount 


Tiilr 

Am’ts 

IN 

000»s 

Frequencies 

IDEN- 
TICAL! 
Col. 1 

Approximate, by Groups 

5 Mil. be- 
ginning’ at 
45 Mil. 
Col. 2 

10 Mil. be- 
ginning at 
40 Mil. 
Col. 3 

10 Mil. be- 
ginning at 
45 Mil 

Col 4 

15 Mil. be- 
ginning at 
45 Mil. 
Col. 5 

8 Mil. be- 
ginning at 
46 Mil. 
Col. 6 

1901 

46,631 

1 



1 




1904 

48,841 

1 


3 

f ® 




1895 

49,332 

1 



J 




1899 

50,158 

1 



1 

^ 6 

- 7 

6 

1897 

51,899 

1 


3 

i 4 




1898 

52,660 

1 



1 




1896 

55,350 


1 

J 




1905 

60,509 


1 

l 9 

} 2 


} *2 

1900 

67,398 


1 

1 ^ 

) 


1 

1906 

70,964 


1 


1 

1 . 


<4l 

1908 

71,073 



3 

1 3 

1 ^ 


1 ^ 

1903 

74,874 

1 

J 

9 

J 


1 

J 

J ^ 

1910 

86,037 

1 




1 o 

■■ 


1909 

, 86,518 

1 


2 

I 2 

I 2 . 


} 2 

1902 

98,716 

1 

1 

1 

l 9 ■ 

1 9 

1 

1907 

104,792 

1 

1 

1 9 

f 2 . 


1 9 

1912 

109,780 

1 

1 

I ^ 

1 9 


r 2 

1911 

113,768 

1 

1 

1 

I ^ . 

B 

1 

1913 

121,852 

1 

1 

1 

1 

1 

1 
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mode? Undoubtedly the most characteristic amount imported 
when the whole period is considered is less than 60 million 
pounds. But how much less? The arithmetic mean of the 
amounts less than 60 million pounds is 60,695,000 and the 
median 50,158,000. The most characteristic amount with a 
10 million group is 46 to 56 million, of which there are seven 
instances; more narrowly, there are five years in which the 
amounts imported are between 49 and 56 million. It is prob- 
ably not wise to locate the mode more accurately than in the 
group 46 to 54 million (column 6). To do so for this type 
of distribution would be to strive for too great precision. 

While in this case, the modal amount of cotton imported 
into the United States is probably more accurately stated as 
falling between 46 and 54 million pounds than by using any 
precise amount, even these limits are purely arbitrary. Others 
might with almost equal merit have been chosen. 

It should be noted that the amounts in Table 46 are ar- 
ranged in ascending order, the exact quantities being indicated. 
The frequencies in this case are the numbers of years in which 
the amounts imported fall into different sized groups. With 
any grouping, these must be of uniform size inasmuch as 
comparative frequency is used to secure the mode. An alter- 
native method of presenting the same data would be to set up a 
series of frequency tables with groups of different widths and 
to tally opposite each group the number of corresponding cases 
(years) . Of course, if this were done, the historical order of 
the series would be broken just as it is in Table 46. Indeed, 
for the calculation of the mode, the order of the years is with- 
out significance. 

If the same data were graphically presented with successive 
time intervals indicated on the X axis, and the amounts shown 
as ordinates at the different years, then the typical or modal 
fact would be indicated by uniformity in the lengths of the 
ordinates. 

When historical data are plotted cumulatively, as in Figure 
62, the modal position or positions are shown by the tendency 
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of the graph to increase or decrease, as the case may be, at a 
uniform rate. Inasmuch as the chronological order is followed 
in cumulating, modal amounts will probably not be placed in 
juxtaposition. If this is so, the dominant characteristic is 
difficult to locate. The use of the graphic method for deter- 
mining the mode in historical series is not advocated. 

{B) The Location of the Mode in Svace Series 

Suppose it were desired to find the modal number of passen- 
gers carried on different divisions of a railroad; or the modal 
maintenance cost of road bed for successive miles, data being 
available respectively by divisions and by miles. The problem 
would be analogous to that just given concerning imports of 
cotton for successive years. In the space series, the divisions 
and miles, respectively, would be the frequencies corresponding 
to the different numbers of passengers and to total costs 
Some sort of grouping would undoubtedly be necessary to de- 
termine the modal amount, but the size of the groups would 
probably have to be arbitrarily selected. Moreover, if the data 
were graphically presented on the ordinate or Y axis and the 
successive divisions and miles on the abscissa or X axis, then 
modality would be indicated by uniformity in the lengths of 
the ordinates. Similarly, if for successive divisions and miles, 
the data were cumulated, modality would be shown by the 
tendency of the graphs to increase or decrease, as the case may 
b^, at a uniform rate. The graphic method, however, is not 
well suited to determine the mode in such series. 

(S) The Location of the Mode in Frequency, Seriee 

The measurements of a variable characISistic or attribute 
of a phenomenon at an instant of time produce what is known 
as a frequency series. The same type of measurement — as 
height, fqr ^Ipstance — of each member of a class, or repeated 
measurei^^ts of an individual of a class, give such series. 
Their properties have already been discussed in other con- 
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nections.^ We are now interested in the meaning and location 
of the mode in such series. 

Table 17^ shows the number of real estate mortagages in 
Wisconsin in 1907, classified by rates of interest. This is a 
discrete series. The most common interest rate as shown by 
the table is 5 to 5% per cent. Of the total number of mort- 
gages — 28,961 — 10,262 had rates falling within these limits. 
This is the modal group, but what is the modef Widening 
the groups as in columns (b) and (c) of the table produces 
modal groups at 5 to 6 per cent, and 4% to 5% per cent, 
respectively. The precise mode, however, is in doubt; it is 
ho more accurately approached by the latter process. The 
truth is that the most common rate is 5 per cent — a conven- 
tional unit for borrowed money — and is not revealed by any 
scheme of grouping. 

Moreover, inasmuch as this is a discrete series, there is ho 
reason why one should interpolate for the mode, in an attempt 
to give effect to the pull which the frequencies adjacent to 
the modal group might seem to have on the location of the 
true mode. Instances are not uniformly distributed through- 
out the modal group, nor through the groups adjacent to it — 
they congregate on definite imits. In this case there is no 
basis for. assuming that the instances are uniformly distributed 
on either side of a true mode. Accordingly, the smaller the 
group the better. The mode in this case is not ideally placed 
at the center of a probability series. The items above and 
below it do not help to determine its location. 

The case, of course, is quite different with continuous series. 
Tables 18 and 26 and Figure 45 show such series. In these 
the measurements are only approximations to an ideal, the 
groupings being arbitrary. A true mode both in the samples 
and in the complete “universe'^ may be expected, and it is 
legitimate on the basis of what is known about the measure- 


^ See supra, p. 157 f . 
* P. 164. 
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ments to widen the groups until a mode appears. Moreover, its 
group position once located, it may be more accurately and 
precisely fixed by interpolation, effect being given to the ^^pulP' 
of the items adjacent to it. This follows because it is known 
by hypothesis that if the measurements were more accurately 
made, and the sample more complete, there would be a true 
mode. Hence the validity of the attempt to fix it for the 
series in question. 

But statistical series are rarely homogeneous — differences 
characterize them in other respects than the attribute which is 
measured. For instance, the carpenters whose wage-rates 
are measured may differ as to training, kind of work done, etc."; 
the retail stores whose operating expenses as percentages of 
sales are compared differ as to size, location, business manage- 
ment, etc. All of these non-homogeneous conditions may make 
tb^ mode of the aggregate non-typical of the parts. This fact 
is illustrated in the series in Table 47. 

Table 47 shows the number of store-periods (monthly) in 
retail meat stores in which the ratios of operating expense 
to sales were classified amounts. For the total, the modal 
per cent group is 18-20; for stores with annual sales of less 
than $20,000, it is 20-22; for those with annual sales between 
$20,000 and $45,000, it is 18-22. For those with annual 
sales between $45,000 and $75,000 it is 18-20 per cent, and 
for those with annual sales of $75,000 and over it is 14-16 
per cent. What is the mode? In spite of the fact that the 
modal group is fairly definite for each class of stores and for 
the total, it varies inversely in size with the amount of business 
transacted. What is typical for the aggregate is not generally 
typical of its parts. 

In series which are continuous, as are those shown in Table 
47, modes may be interpolated for within their respective 
groups. The manner in which this is done may be illustrated 
as follows by using the total column in Table 47. The modal 
group is 18-20 per cent, the number of frequencies being great- 
est at this point. In the next higher group there are 190 cases, 
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and in the one immediately below, 170 cases. Combined, these 
170 

make 360 instances. are exerting an influence to place the 

190 

mode below the 18-20 per cent group; and to place it 
170 

above this group. of 2 per cent — ^the width of the modal 
190 

group — is 0.94; of 2 per cent is 1.06. Accordingly, the 


TABLE 47 

% 

Number of Store-Periods (Monthly) in which Ratios of Oper- 
ating Expense to Sales were Classified Amounts in 
Retail Meat Stores 


Total Expense 
Per Cent op 
Sales 

Total 

(Store-Periods 

Monthly) 

Number op Store-Periods (monthly) with 
Classipibd Yearly Sales in OOO’s * 

-$20 

$20-$46 j 


$75 & o^er 

Total 

1088 

257 

622 

143 

66 

10-12 1 

10 


8 


2 

12-14 

28 


11 

6 

11 

14^16 

108 i 

2 

67 

17 

22 

16-18 

170 

10 

no 

37 

13 

18-20 

196 

19 

120 

47 

10 

20-22 

190 

43 

120 

20 

7 

22-24 

136 

35 

89 

11 

1 

24-26 

73 

31 

39 

3 


26-28 

54 

26 

26 

2 


28^0 

33 

24 

9 



30-32 

27 

18 

9 



3^4 

14 

8 

6 



34-36 

20 

17 

3 



36-38 

1 9 

7 

2 



38-40 

1 11 

8 

3 



40-42 

9 

9 





♦ The groups are chosen so as to reflect as accurately as possible one- 
man, two-man, three-man, four-man and larger stores. This explains the 
reason for their unequal size. 
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mode is 18 per cent + 1.06 per cent or 19.06 per cent; or 
conversely, it is 20 per cent — .94 per cent or 19.06 per cent. 

Is there such a mode in reality? What is gained by such 
nicety of calculation? Is not such an amount pure fiction? 
Inasmuch as this series is truly continuous, such a mode may 
in fact appear, yet even in this case too great refinement may 
have the effect of making the mode unreal. The figures to the 
right of the decimal point may never be encountered. Yet 
there is no reason why they may not appear since continuity 
characterizes the series. There are, moreover, certain advan- 
tages in making the mode precise, the chief of which is thatj 
in this form it can be compared with the arithmetic mean and 
median — ^two other statistical summaries. 

But why consider only the frequencies immediately adjacent 
to the modal group? Why not give weight to all of those 
below and to all those above this position? There is no reason 
why this should not be done, but there is little reason for doing 
it. If a series approaches’ the normal type, the pull of the items 
on one side is largely counterbalanced by that of the items on 
the opposite side. In markedly asymmetrical series only, will 
the position of the mode be materially changed by giving full 
effect to the influence of all of the items, and it is precisely 
these in which a “true^^ mode is not to be expected. 

When frequency series are plotted on a simple graph, the 
modal position is shown by the maximum ordinate.^ The 
meaning of the measurement at this ordinate, however, is 
different for discrete and for continuous series. How different 
has already been considered. Such graphic illustrations, in 
this respect, are unlike those showing time and space series. 
In the latter, the maximum ordinate shows extreme rather 
than modal measurements. This follows because at each time 
or space unit on the X axis, a single instance is illustrated on 
the ordinate. The mode is shown by ordinates of equal or 
approximately equal length. 


^ See Figure 45. 
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On ogives or cumulative graphs of frequency series, the 
mode or place of greatest frequency appears at the position 
where the curve passes through the greatest distance vertically 
in a given distance horizontally, that is, at the position where 
the curve is most nearly vertical, or at the point of inflection. 
Bowley has suggested the empirical rule of rotating a ruler 
on the curve at this point in order to determine its exact loca- 
tion. But this method of determining its position is only 
roughly satisfactory. The modal positions on Figure 48, how- 
ever, were located in this manner. 

When series are arranged in frequency groups and distri- 
butions are irregular, showing no tendency to be dispersed in 
a definite order around a modal center, it is frequently desir- 
able successively to widen the groups, at the same time alter- 
ing the frequencies to correspond, until regularity appears. 
There is always the danger, however, when dealing with dis- 
crete series, of concealing the individual peculiarities of the 
data and of forcing a mode to appear. Group adjustment may 
be used as a method of correcting a false impression, as, for 
instance, when data clearly of the continuous type have been 
distorted from the order which they should properly assume 
because of the limitations of the units in which they are 
expressed or by inadequacy of sampling.^ It is always a 
question, however, to know how far to carry this synthesizing 
process.^ In effect, it is a method of smoothing and, therefore, 
in discrete series, sacrifices individual characteristics in order 
to secure general impressions. The peculiarities of the whole 
series dominate those of the parts. It should be remembered 
that for discrete series, group widening in order to secure 
regularity of distribution should rarely be employed. This 
topic was discussed in Chapter VI, and can, therefore, be dis- 

a See the Table showing the measurements of lengths of lobsters, 
Chapter VI, p. 165. 

*See Secrist, Horace, Reddings and Problems in Statistical Methods^ 
Macmillan, New York, 1920, pp. 278-282, for a discussion by Knibbs, 
G. H., of “The Theory and Justification of Curve Smoothing.” 



FIGURE 63 

Histograms Showing the Distributions of Ratios of Assessed Values op Buildings 
THE Assessed Values of Lands upon which they Stand, New York City, 1914 
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posed of with this word of caution, and with brief reference 
to Figure 63. 

3. SimMABY 

The mode of a statistical series is always represented by 
actual or implied cases. But not all series have clearly defined 
modes. Continuous series which by hypothesis follow or ap- 
proach the ideal distribution of the normal curve may be 
manipulated in order to secure a true mode. Those which are 
discrete should not be so treated. 

The modes of the parts of an aggregate do not necessarily 
average or add to the mode of the total.^ Moreover, this 
form of a statistical summary rejects all exceptional instances, 
the type being determined solely by degrees of uniformity. 
That which is most common is modal. But commonality is 
frequently difficult to define because so much depends upon 
the standards by which one chooses to establish it. There can 
never be a difference of opinion as to the arithmetic mean of 
a series, but there may be as to the mode. The arithmetic 
mean is rigidly defined; but a mode is not. 

VI. The Geometric Mean 

The geometric mean of the values of the items in a 
series is the nth root of their product. Rather than 
adding the values together and dividing their sum by the 
number of items, as is done in calculating the arithmetic 
mean, the geometric mean is secured by multiplying the 
values of the items together and taking the root corresponding 
to the number of items. The formula is: Geometric Mean = 

•V^Pi X X 2>3 X Pi, Pi, Pi —Pn referring to the val- 

ues of the different items, and n to the number of items. The 

*The Bureau of Business Eesearch, Harvard University, adjusts the 
modes of the different expenses in conducting retail and wholesale stores 
so as to add to the modes of the total expenses. This practice is equiv- 
alent rigidly to defining the mode, a practice to be justified only when 
distributions are of the probability type. 
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arithmetic mean of 2, 3, and 4 is 3; the geometric mean is 
^2 X 3X4 =2.9 (approximately) . 

The geometric mean is most easily calculated not by succes- 
sively multiplying a series of numbers together and extracting 
the corresponding root, but by using logarithms. Certain rules 
for their use are as follows: 

(1) To multiply a series of numbers together add their 
logarithms. The natural number corresponding to the result 
is equal to the product of the numbers. 

(2) To divide one number by another, subtract the logar- 
ithm of the divisor from the logarithm of the dividend. The 
natural number corresponding to the result is the quotient. 

(3) To raise a number to any power, multiply the logarithm 
of the number by the power exponent. The natural number 
corresponding to the product is the required power of the 
number. 

(4) To extract any root of a number, divide the logarithm 
of the number by the index of the root. The natural number 
corresponding to the quotient is the root of the number. 

It is desired, for instance, to compute the geometric mean of 
the ratios of total operating expenses to sales for all stores 
as shown in Table 47. The method is as follows: 

(1) Find the log of 11 — ^the middle of the first group. This is 
1.0414. Eaise this to the 10th power, that is, the power corre- 
sponding to the frequency. This is done by multiplying the log 
1.0414 by 10 which gives 10.4140. 

(2) Fmd the logs of the centers of each of the other groups, and 
multiply them respectively by the powers or the corresponding 
frequencies. 

(3) Add the products as found in (1) and (2) above. 

(4) Divide the total by the number of powers, that is, by 1088. 

(5) Find the natural number corresponding to this quotient. This 
is the required geometric mean. 

Each of the steps through which the above data must be 
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carried in order to calculate the geometric mean is shown in 
Table 48. The geometric mean ratio is 20.7, the arithmetic 
mean 21.3, the median 20.4, and the mode 19.1. 


TABLE 48 

Table Showing the Steps Used in Calculating a Geometric 

Mean 


Ratio Expknsb 

TO Sales 

(Center of Group) 

Logs 

Powers 

Products op 

Logs and 

Powers 

11 

1.0414 

10 

10.4140 

13 

1.1139 

28 

31.1892 

15 

1.1761 

108 

127.0188 

17 

1.2304 

170 

209.1680 

19 

1.2788 

196 

250.6448 

21 

1.3222 

190 

251.2180 

23 

1.3617 

136 

185.1912 

25 

1.3979 

73 

102.0467 

27 

1.4314 

54 

77.2956 

29 

1.4624 

33 

48.2592 

31 

1.4914 

27 

40.2678 

33 

1.5185 

14 

21.2590 

35 

1.5441 

20 

30.8820 

37 

1.5682 

9 

14.1138 

39 

1.5911 

11 

17.5021 

41 

1.6128 

9 

14.5152 

Total 

1088 1 

1430.9854 


Log 1430 9854-^ 1088 == 1,3152 Log 

The natural number of Log 1.3152 = 20.7 (approximately). 

This is the geometric mean. 


But such a use would rarely be made of this average. This 
example is inserted so as to show the manner in which the 
computation is made. More appropriate uses of this average 
are discussed below in Chapters XV and XVI.^ 


' Passim. 
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VIL The Properties of the Arithmetic Mean, the 
Median, the Mode, and the Geometric Mean Com- 
pared AND Contrasted 


The properties of the different averages discussed in this 
chapter when computed from statistical series may be sum- 
marized as follows: 


Characteristics or Properties 

1. Data Eequired 

(1) All the frequencies and the exact 
size of all amounts. 

(2) All the frequencies but the exact 
size of only certain amounts, 

(3) Only certain frequencies and cer- 
tain amounts. 


Averages Represented 

Arithmetic Mean, Geo- 
metric Mean 
Median 

Mode 


2. Representation in a Series 

(1) May be represented Arithmetic Mean, Me 

dian, Mode, Geomet- 
ric Mean 

(2) Must be represented (actually Mode 
or ideally) 


3. Order of Arrangement for Calculation 

(1) A definite order 

(2) Any order 

4. Influence of Extreme Items 

(1) Proportional to their size and 
frequency 

(2) Proportional to frequency alone 

(3) Small numbers given proportion- 
ally larger influence 

(4) No influence 


Median and Mode 
Arithmetic Mean 
Geometric Mean 

Arithmetic Mean 

Median 

Geometric Mean 
Mode 


5 Relative Size m the Same Senes 
(1) Permanent diflerences 


Arithmetic Mean ex- 
ceeds the Geometric 
Mean in all cases ex- 
cept when all values 
of a series are equal 
to each other 
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(2) Variable differences Relative size of Arith- 

metic Mean, Median, 
and Mode depends on 
the distribution of the 
items in series 

6. Relative Position in Series Median always lies be- 

tween the Arithmetic 
Mean and Mode in 
mono-modal distribu- 
tions 

Arithmetic Mean and 
Geometric Mean 
Median and Mode 

Median and Mode 
Median and Mode 

Arithmetic Mean and 
Geometric Mean 

1 1 . From Averages of the Parts, Averages Arithmetic Mean and 

of a Total may be Secured Geometric Mean 

12. When Substituted for Each of the 
Original Items 

(1) Sum of, remains the same Arithmetic Mean 

(2) Product of, remains the same Geometric Mean 

13 Sum of the Deviations from, a mini- Median 
mum 

14 Algebraic Sum of the Deviations Arithmetic Mean 
from, Equals Zero 

15. Can be Calculated from Totals Only Arithmetic Mean (iso- 
lated) 

VIII. The Average to Use — Some Typical Cases where 
Choice is Important ^ 

Suppose a firm were interested in the experience of one of 
its salesmen as a basis for promotion to a new territory or 

* Examples in which it is desirable to use the geometric mean are given 
in Chapters XV and XVI. 


7. Degree of Precision of Measurement 

(1) Definite 

(2) Often Indefinite 

8 May be Interpolated for 
9. May be Located Graphically 
10. Can be Algebraically Treated 
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to an advanced wage or salary scale. It is further supposed 
that the sales record of this man is available over an ex- 
tended period, the sales being listed by territory, by grade 
of commodity, by prices of the article sold, by profits realized 
by the firm, by the length of time utilized in making them, 
by cost to the firm in present salary and expenses, etc. Can 
the sales experience of this man be averaged? If so, what 
average shall be used? Is the arithmetic mean — an average 
of sales during good and bad days, of sales among all classes 
of buyers, of those requiring one call and those requiring close 
following up, of small and large sales, of those upon which 
small as well as large profits are realized, etc. — a suitable 
measure of a salesman's activity? 

If it is not, then probably a weighted average would be more 
appropriate, especial importance being given to large sales, 
sales of goods upon which a high rate of profit is made, etc. 
Is an average which takes account of the bad days and the 
small sales, of the good days and the large sales, but which 
gives no more importance to one of them than to another 
more satisfactory for this purpose? Such a line of thought 
suggests the advisability of using the median. But, comes 
the retort from one who approaches the problem from another 
point of view: “This man has had a consistent record of 
a high order, and it is neither fair to the man nor to the 
company to give weight to his misfortunes. The facts show 
that he can be expected to make such and such a record — 
the overwhelming percentage of his sales are of this character; 
or, in other words, the percentage of the time in which he fell 
below a high standard is negligible and should be given no 
weight. If his mistakes and failures are considered, a pre- 
mium will be put upon mediocrity and insufficient recognition 
given to real merit.^' Such an argument suggests the wisdom 
of using the mode. 

It may be argued that it is unwise to let any one set of 
circumstances govern, no matter from what angle the problem 
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is approached, and, undoubtedly, this is true. However, no 
matter how carefully the promotion is considered, if the facts 
above indicated are held to be germane, it is necessary to de- 
cide upon the weight to be assigned to the approaches in- 
dicated in these different averages. It is, of course, conceiv- 
able that the various averages would not be materially dif- 
ferent. If this is true, any one of them may be used. As to 
whether averages can be used is one question: which one to 
use, in case they are allowable, is quite another. It is the 
latter question which is now being discussed. 

Again, suppose that one were interested in the time neces- 
sary to reach his work — a fact governing his location for 
residential purposes — and that there existed but one available 
means of transportation. Is it the arithmetic mean time, 
the median time, or the modal time in which the distance is 
traveled which is of interest? Delays happen even in con- 
nection with the best transportation service.^ Should the pos- 
sibility of these be considered or should they be regarded as 
negligible on the ground that they are irregular and uncertain? 
If one sets great weight upon punctuality, he undoubtedly will 
allow for this factor in spite of its contingency. 

On the other hand, if the transportation company in ques- 
tion were advertising its service, it would feature the typical 
or modal if not the shortest performance. If many measure- 
ments were taken of the required time to make the trip, it is 
doubtful whether the differences between the various averages 
would be large. The distribution of frequencies would tend 
to conform to the normal law of error curve and the averages 
closely to agree. On the other hand, if* few measurements 
were taken, and if the delays were frequent, the characteristic 
or modal might be widely different from the mean time. There 
would be no tendency for delays to be compensated for by 

^ See ^‘Report” of the Chicago Traction Buhway Commission^ “On a 
United System of Surface, Elevated and Subway Lines,” pp. 272-274, 
Chicago, 1916, for an analysis of the classified causes of one year’s 
reported delays of more than five minutes’ duration on the surface lines. 
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exceptionally quick service, since most of the runs would be 
made according to schedule. The arithmetic mean would ex- 
ceed both the median and the mode. It is precisely this fact 
which needs to be considered by the person who desires to 
reach his office each morning at or before a stated time, and 
which the advertising manager of the company desires not 
to bring to the attention of the public. It is evident that the 
averages accurately reflect the characteristics of the data, but 
they call attention to different things. 

One might be interested in the ^^average’^ suit of ready-made 
clothes turned out by a clothing concern, but the kind of ar\ 
average best suited to his purposes will depend upon what 
those purposes are. If he is in the production side of the busi- 
ness his interest is in typical or standard sizes determined for 
him by the physical facts of size and proportion of men. The 
great majority of sales will be to individuals who conform 
within narrow limits to standard measurements. The manu- 
facture of these garments constitutes his problem. His inter- 
est lies in the modal suit; not in the median nor in the 
arithmetic mean, as such. If he considered the arithmetic 
mean and manufactured his garments according to the sizes 
determined by such a calculation, it is doubtful if his cus- 
tomers could be fitted, since such measurements imply that the 
exceptionally large and the exceptionally small will affect 
the measurements of suits designed for the great homogeneous 
and standard majority. If large quantities of suits were man- 
ufactured, it is true that the mode, the median, and the arith- 
metic mean sizes would closely agree; but by the prudent pro- 
ducer this agreement would be taken for granted only where 
production was on the largest scale. 

Likewise, if the value instead of the size of the '^average^^ 
suit were uppermost in one's mind, it is doubtful if the arith- 
metic mean would be particularly enlightening. Such a 
figure is too general, too indefinite, for any but the most 
superficial purposes. Some sizes tend to be normal; this 
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grows out of a physical fact. Values teud to be normal or 
characteristic too, but their normality is not reflected in an 
arithmetic mean, as it is in the case of sizes, since all values 
may or may not be represented in the various sizes manu- 
factured. Suits which can be manufactured according to 
set measures and in large quantities, other things being 
equal, tend to be cheap. Suits which are manufactured only 
to special order and in relatively small quantities, other 
things being equal, tend to be dear. The exceptional in 
either case would be weighted heavily and the characteristic 
be far different from the mean price. As a basis for roughly 
Estimating profit an arithmetic mean price may be all that 
is required, but for shaping a selling policy an intimate study 
of the characteristic prices for the various types of demand 
is necessary. This is merely another way of saying that 
only homogeneous data can be properly averaged, and that 
the merits of each average must be settled in the light of its 
use. 

The errors into which one may be led by indiscriminately 
using an average of non-homogeneous data are admirably 
shown in Table 49 giving deaths and death-rates of married 
and unmarried men in Scotland.^ 

^The first striking fact which this table reveals is that the death- 
rate of the bachelors was double that of the married men between 
the ages of 20 and 25. As its persons became older, this excessive 
difference in the death-rates of the married and the unmarried de- 
creased slowly and regularly, showing the difference m favor of 
the married men at every period of life It is thus proved that the 
state of bachelorhood is more destructive to life than the most un- 
wholesome trades. When we come to the total death-rate at all 
ages, however, the very reverse is the case. The general death-rate 
among married men is very much higher than that among single 
men; so that, while only 1,723 bachelors died during the year out 
of every 100,000 bachelors, 2,338 married men died out of a like 
number of married men. 

^ See also an analogous case in Secrist, Horace, “A Statistical Para- 
dox,” J(mrml of the American Statistical Association^ June, 1923, pp. 
776-780. 
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^This apparent contradiction may be explained as due to the fact 
that the number of bachelors being far greatest at that period of 
life when the mortality is very low, namely, from 20 to 24, whereas 
the number of married men is greatest at those periods of life 
when mortality is high, seeing that mortality increases with age. 


TABLE 49 

Table Showing Deaths and Death-Rates op Marbied and 
Unmarried Men in Scotland, 1863, Classified by Age Groups 

(From the 9th Detailed Report of Dr. James Stark to the Registrar- 
General of Births, Deaths, and Marriages in Scotland) 


Ages 

Married 

Unmarried 

Number 

Livmg 

Deaths 

Death-Rate 

Number 

Living 

Deaths 

Death-Rate 

All ages 


11,765 

23.4 

243,259* 

4,189 

17.2 

20-25 

22,946 

137 

6.0 

106,587 

1,251 

11.7 

25-30 

54,221 

469 

8.7 

48,618 

666 

13.7 

30-35 

66,153 

600 

9.1 

25,962 

383 

14.8 

35-40 

63,858 

690 

10.8 

15,857 

253 

16.0 

40-45 

62,645 

782 

12.5 

12,311 

208 

16.9 

45-50 

54,505 

869 

15.9 

8,824 

179 

20.3 

50-55 

49,591 

880 

17.7 

7,636 

• 205 

26.8 

55-60 

38,006 

929 

24.4 

5,550 

142 

25.6 

60-65 

35,920 

1,216 

33.9 

5,242 

227 

43.3 

65-70 

22,021 

1,134 

51.5 

2,848 

156 

54.8 

70-75 

16,029 

1,291 

80.6 

2,021 

205 

101.4 

75-80 

9,716 

1,135 

116.8 

1,081 

157 

145.4 

80-85 

5,477 

953 

174.0 

513 

101 

19619 

85-90 

1,708 

488 

285.7 

151 

32 

21L9 

90-95 

449 

137 

3051 

50 

21 

420.0 

95-100 

103 

40 

388 4 

6 

3 

500.0 

100 and 







above 

28 

15 

535.7 

3 




*As reported. The correct total from the addition is 243,260. The 
table is quoted from Bliss, George I. — “The Influence of Marriage on 
the Death-rate of Men and Women,” in Quarterly Publications of the 
American Statistical Association, March, 1914, p. 55. 
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Furthermore, almost half of all the deaths of the bachelors occur 
before the thirtieth anniversary, at which period the mortality is 
much lower than at the more advanced periods of life. When the 
whole deaths at all ages are thrown together and compared with 
the total bachelors living, the general mortality seems to be little 
higher than that due to the earlier period of life. Among the married 
men, on the other hand, the greatest number of deaths occur be- 
tween the sixtieth and seventy-fifth year of life, at which period 
the mortality is high as compared with the number living. Conse- 
quently, when the total deaths of husbands of all ages are compared 
with the total living, a high mortality seems to have prevailed, 
because the persons were all so much older when they died than 
were the bachelors. Therefore, comparing the total deaths of the 
married at all ages with the total deaths of the bachelors, neces- 
sarily leads to a false conclusion. In comparing mortality rates of 
two or more classes, to be correct, it must be limited to comparing 
at each age group, and the smaller we take the age group the more 
nearly correct are the rates.'^ ^ 

While this illustration is drawn from mortality statistics, 
and seems to have little or no bearing on the problems of the 
business man, except in so far as it illustrates the error into 
which one may be led by making his basis of generalization 
too broad, and therefore his conclusion too indefinite, it sug- 
gests a problem of practical import to the business world. 

In most states, laws now require that employers of labor 
provide in some manner for the compensation of accidents 
which occur to their employes while engaged in the regular 
course of business. Because of the failure to define an “acci- 
dent, and because accidents which occur are related to so 
broad a base, without differentiating between hazardous and 
non-hazardous occupations, slight and severe accidents; and 
because of the failure to keep accurate records of accidents, 
employers have not had until recently, if they now have, 
an adequate basis for computing accident risks.^ 

^ Quarterly FuhUcaU<m$ of the Arruerican Statistical Association, 
March, 1914, p, 56. 

“Bubinow, I. M., **The Standard Accident Table as a Basis for Com- 
pensation Bates, Quarterly Publicatvom of the American Statistical 
Association, March, 1915, pp. 358-415. 
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Discrimination between severe and minor accidents, and 
hazardous and non-hazardous/con<ii1<iQns of ^ployment, is 
the first essential to clear th^ing %%ou£ ^c^fents, and a 
partial guaranty of the reasonableness of insurance pre- 
miums.^ A rough arithmetic mean, a median, or a mode, per 
SBj is not enough. What is necessary is the determination of 
the characteristic accident rate, not for industries as a group, 
but for conditions of employment, definitely standardized, 
within each industry. 

Statistics should always relate to definite conditions and 
circumstances. Duplicate these and the statistical facts are 
likely to be repeated. Alter them and the consequences are 
different. Before a policy can be mapped out on the basis 
of statistical facts alone, or given consequences said to follow 
from given conditions, the latter must be definitely and clearly 
defined and their boundaries indicated. 

So-called statistical laws operate with implacable regularity 
only when conditions producing them occur with unchanging 
persistence. To establish beyond cavil cause and effect requires 
not only that statistical data be referred solely to the condi- 
tions' that produce them, but also that the statistical means 
employed to interpret them be appropriate to the purposes in 
mind. There is no excuse for assigning meaning to averages 
without taking the trouble to determine the conditions which 
produce them or their suitability to the cases in point. 

"An average is not to be regarded as a secret something which 
determines events. This blunder is often made in social statistics. 
After finding a certain average in human affairs, we conclude that 
some secret fate is at work. By the aid of a little rhetoric we 
easily persuade ourselves that an event is fully accounted for when 
'the law of averages^ demands it. There may be an average in 
birth and death and crime, but, after all, the average is not re- 
sponsible for any of them. It takes something more potent than 
an average to produce typhoid fever or to crack a safe.’ ” * 

pp. 358 ff. 

* Coffey, F., Science of IfOpfc,Lona:mans,Loiidon,1912,Vol.II,p.291 
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To employ an average suggests the formulation of a judg- 
ment or a conclusion following from a full consideration of de- 
tail which it replaces. An average represents the culmination of 
a process of thought, which when removed from the steps 
required for its determination is likely to be assigned new 
meanings and used for purposes foreign to those for which 
it was designed. Given statistical application, this means that 
chronologically averages come late in the process of analysis. 
They should be used with discrimination and supported by 
detail, with the realization that they emphasize the general- 
izations and comparisons which seem to be warranted after a 
'careful and painstaking scrutiny of the problem from the 
angle from which it is approached,^ 

The functions of averages are unmistakable; the justifica- 
tion of employing them must be determined by an appeal to all 
the facts and in the light of the peculiarities characteristic 
of the different types. As a statistical caution let it be said: 
Do not rush headlong into the me of averages. They afe 
commonly but vaguely understood, and it is the 'particular 
function of the statistician to adopt that caution and drcum-- 
spection in the use of numerical facts which the seeming exact- 

* “But however often an average may have been confirmed, we can 
never attribute to it the importance of being by itself the expression of 
any necessity. Every result is necessary when its conditions are given; 
every particular instance was necessary in so far as from the given 
conditions it could only be such and no other; all individual deter- 
minations and differences in the particular cases, which were neglected 
by the average, were necessary ; the most extreme deviations were neces- 
sary, and It will also be necessary, if all the particular conditions recur 
in exactly the same way, that they should again have the same results, 
and that therefore the sum of the results will be the same. . . . 

“Such uniformities of numbers and averages are primarily mere 
descriptions of facts which need explanation as much as the uniformity 
of the alternation between day and night; and the explanation can be 
found only where the actual conditions . . . are forthcoming. But these 
are the concrete conditions of the particular instances counted, they are 
not directly causes of the numbers; it is only the nature of the concrete 
causes which can show it to be necessary for the effects to appear in 
certain numbers and numerical relations.” Sigwart, 0., Logic, Swann 
Sonnenschein Oo., London, 1895, Vol. II, p. 490. 
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ness of his took appears not only to suggest but to make 
imperative. 


^ IX. Summary and Conclusion 

An average should be considered as derivative and as 
summarizing and characterizing data in a single expression.^ 
The average best suited for a particular use depends upon the 
purpose one has in mind. Frequently, it is desirable and neces- 
sary to compute not only the arithmetic mean but also the 
median and mode in order to safeguard oneself against criti- 
cism and to reflect types of distributions more in detail. The' 
relations of these averages one to the other are interesting. 
If it is remembered (1) that the computation of the arithmetic 
mean and the median requires all the frequencies; (2) that 
the former is affected by both the size of items and frequen- 
cies, while the latter is affected by frequencies and not by the 
size of items except those at or near the middle; and (3) that 
in the computation of the mode both the size and frequencies 
of exceptional items are ignored, then it is evident that in 
changing the order or number of frequencies the mode is 
scarcely affected at all; the median is only slightly affected, 
and the arithmetic mean violently affected. 

No single average suffices for all purposes. Each is affected 
differently by arrangement, frequency, and size of items, and 
should be used with a full knowledge of the peculiarities of 
distributions. One is never justified in employing a short-cut 
expression in order to describe a complex whole unless he re- 

^An average “is an abbreviation, and it has so much in common with 
the ordinary logical abstract concept that it neglects all differences, and 
we cannot tell from it how far the|numbers from which it is obtained, or 
which it has to represent, may differ from each other. It is, however, 
inferior to the general concept in so far as the latter is a statement of 
what is the same in all the particular instances, while the average is 
merely a fictitious value which may never actually occur in any particu- 
lar case, and which by itself does not even justify us in expecting that 
the majority of the particular instances in a region will approximate to 
it.” Sigwart, G4U>snCf Vol. II, p. 487. 
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alizes what its use implies. Too frequently averages are used 
without discrimination. Derivative expressions of this char- 
acter are often imperfect substitutes for detail. Frequently, 
an exceptional instance which would be ignored in the use of 
the mode is that particular instance in which one has greatest 
interest. On the other hand, the inclusion of an exceptional 
item in determining the arithmetic mean may serve to so 
prejudice it as to give a wholly erroneous picture of the char- 
acteristics which are dominant. The average to be used is 
invariably a function of the purpose which one has in mind. 

As classified data are more readily understood and compared 
than those in heterogeneous form, and tabular arrangement 
superior to unscientific classification, so summary expressions 
of complex data in the form of averages are frequently more 
significant than the detail. The passage, however, from the 
particular to the general — ^that is, from details to averages — 
offers precisely the opportunity for eliminating the peculiar 
and significant features of discrete series. In the case of con- 
tinuous series the conditions are somewhat different. As the 
widening of groups may result in a more accurate expression 
of a general tendency or an ideal distribution, so a more ac- 
curate expression of a complex whole may result from the use 
of a single unit, as mean, median, or mode. 

Caution, foresight, and analysis are necessary at every step 
in the use of averages — caution as to the averages to be em- 
ployed, foresight as to the meaning which may be attached to 
them, and analysis as to the possibilities of data being char- 
acterized in such a manner. The following tests should always 
be applied: Is it possible to employ a single expression to 
depict the details which are essential in order to view the data 
in all their bearings? Is the greatest interest in the charac- 
teristic feature, in the median position, or in the center of 
gravity at which the arithmetic mean falls? Is it necessary 
to employ all of these descriptive units? No single answer to 
these various inquiries can be given. The use of an average 
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may be legitimate and still the question as to the most appro- 
priate average be left in doubt. One cannot answer the first 
question, as it were, by intuition. Data must be analyzed and 
the functions of averages in general and in particular be 
clearly understood before answer can be given. As caution 
and analysis are necessary in the employment of averages, so 
discrimination and judgment are necessary in assigning im- 
portance to them when used by others. 

^ A fitting close to the discussion of averages is found in the 
words of Dr. John Venn. ^'Every sort of average — and there 
are many such sorts — is a single fictitious substitute of our 
own for the plurality of actual values existent in the results, 
which are naturally or artificially set before us. It is impos- 
sible, therefore, for the former, in any case, effectually to take 
the place of the latter. But the extent to which it may suc- 
ceed or fail in doing so will depend upon the nature of the 
facts presented to us, and still more upon the precise object 
we have in view.^^^ 
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CHAPTER X 


DISPERSION 
I. Introduction 

The preceding chapter was concerned with averages of the 
^'first order^^ — ^those statistical summaries computed from the" 
gross items in different kinds of series. It was learned that 
they have different properties; that they require the details 
from which they are calculated to be treated differently ; that 
some ignore or treat lightly exceptional instances, while others 
attach to them marked significance; etc. Notwithstanding 
their differences, however, they all have one common purpose 
— ^that is, to serve as substitutes for or types of the detail 
which they replace. 

- But different averages may and generally do give different 
^%pes^^ for the same series. Which, then, is to be selected? 
The answer to this question must be determined in the light of 
the purpose which one wants the type to serve. As the pur- 
pose differs, the selection of the averages must of necessity 
change. 

But averages of the “first order” while useful never fully 
characterize the detail from which they are made up. In all 
but the rarest cases some or all of the items differ from the 
one or ones which are selected as a type. Some measure of 
the differences from the average is necessary. Averages of 
the “second order” serve this purpose. By their use a type 
not of the gross items but of the differences of these from 
some center or position is secured. Indeed, in some cases, 
more than a type is required. To average them is equivalent 
to doing for them the same thing which is done for the gross 
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itemfr— that is, merging their dissimilarities (by using an 
arithmetic mean) ; selecting those which are typical (by using 
a mode) ; or choosing the one centrally located (by using the 
median) . An alternative to the selection of a type is to em- 
ploy some form of a distribution of detail, but this is often 
unsatisfactory for the same reason that it is when dealing with 
the gross items themselves. Precise summaries are needed if 
for no other reason than because of their brevity. 

The things about statistical series which it is desirable to 
know are: (1) the number of instances involved; (2) the aver- 
age, central or typical fact; (3) measurements of the differ- 
ences of the individual items from each other or from their 
averages; and (4) summaries of the manner in which the items 
are distributed about their average. To secure the summaries 
in (2), averages’ of various sorts are computed; to obtain 
those in (3), measures and coefficients (ratios) of dispersion 
are calculated; and to get those in (4), measures and coeffi- 
cients (ratios) of skewness (degrees of asymmetry) are deter- 
mined. Summaries of the second type are discussed in the 
preceding chapter; those of the third type, in this; and those 
of the fourth, in Chapter XII. 

II. The3 Meaning op Dispersion 

In statistics, there are two uses of the Jberm “di^ 

One is general, calling attentiontoTEe fact that the items in 
statistical series differ' in size. A wage series with items run- 
hlhg from |4.00 to $12.00 per day is spoken of as having 
greater dispersion than another one having items ranging from 
$5.00 to $8.00 per day. That is, the instances are dispersed or 
scattered over a wider range in the first than in the second 
case. The amplitude of variation is greater. In a more 
precise sense, the term is used as an absolute or relative mea- 
sure of the differences nf the" ifcexns , in a series from the aver- 
age or characteristic ampimt. .The first use calls attention 
£o the limits within which data fall; the second use, to an 
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amount (absolute or relative) by which the data differ from 
a selected standard or type. The two uses are fundamentally 
different. In what way will appear as the methods of mea- 
suring dispersion are described. 

III. Measures and Coefficients op Dispersion 
1. THE Method of limits 

Dispersion in the general sense indicated above is shown 
by the “method of limits,’’ the complete range of the values 
of the items or other conventional divisions being used for 
this purpose. Examples of the use of different limits, and 
of ways of stating the degrees of dispersion will illustrate this 
method. 

(1) The Range 

The simplest way of expressing the degree of difference be» 
tween items in statistical series is to choose the extreme limits 
within which they fall — ^that is, to select a minimum and 
maximum above and below which all items are found. In 
frequency series, however, it is difficult to define the limits 
exactly if the groups at the ends of the series are open. When 
this occurs, approximation is necessary. In historical series, 
on the other hand, approximation is unnecessary — ^the actual 
amounts always being given. Moreover, the selection of ex- 
tremes in the two kinds of series has a different significance. 
In those of the frequency type the extreme measurements are 
relatively few. This is always the case in series which are 
symmetrical and in those which approach the normal curve of 
error form.^ Accordingly, to select the extremes is to choose 
non-typical cases. In historical series, on the other hand, since 
there is no presumption of normal distribution, either extreme 
may be as nearly typical as any other measure. But in either 
case, to measure dispersion by the range gives no idea of the 

^ For an illustration of the ideal curve of error, see p. 378. 
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distribution between the extremes. Illustrations will show 
the force of this contention in typical cases. 

In the historical series in Table 45, the extremes are 46,- 
631,000 lbs. and 121,852,000 lbs. This fact carries a certain 
amount of significance but it does not indicate the dispersion 
of the items between these limits. It does, of course, rule out 
such ideas that as small and as large amounts, respectively, 
as 20,000,000 lbs. and 200,000,000 lbs., for instance, were im- 
ported. It does not, however, indicate the fact that the mini- 
mum amount is far more characteristic of the series than is 
the maximum. Moreover, the extremes might remain the 
same and the distribution between them be quite different. 

In the frequency distribution in Table 43 the limits are 
$5.00 and $14.99, but such amounts are exceptional. More- 
over, the frequencies in the lowest group, $5.00 to $5.99, are 
fifteen times as numerous as those in the highest group, $14.00 
to $14.99, As to the distribution of values between these 
limits, the range tells us nothing. Something more than this 

TABLE 50 

Table Illustrating the Cumulative- or Moving-Range Method 
OP Showing Dispersion in Historical Series 


Ybaes 

Importations 

Amounts in (OOO’s) lbs 

Per cent 

1895 to 1913 

1,421,152 

100.0 

1895 to 1900 

326,797 

23.0 

1895 to 1905 

656,368 

46.2 

1895 to 1910 

1,075,752 

75.7 

The data may be put 

in this manner: 

• 

1895 to 1913 

1,421,152 

100.0 

1910 to 1913 

431,437 

30.4 

1905 to 1913 

825,293 

581 

1900 to 1913 

1,161,753 

81.7 
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crude measure is necessary. This ^^something^^ is supplied by 
the cumulative- or moving-range method described below. 

If the time series is used, some such dispersion summary 
as that shown in Table 50 may be prepared, the amount of 
detail being varied to suit the needs of the problem. 

Applying the same method to the frequency series in Table 
13, p. 287, an arrangement similar to that in Table 51 might 
be used. 


TABLE 51 

Table Illustrating the Cumulative- or Moving-Range Method 
OF Showing Dispersion in Frequency Series 



PRB<iUENCIES 


Amounts 

Per cents 

As much as S5 but less than $15.00 . . . 

434 

100.0 

As much as $5 but less than $ 8 00 . . . 

121 

27.9 

As much as $5 but less than $11.00 . . , 

374 

86.2 

As much as $5 but less than $14.00 . . . 
Or in this manner 

433 

99.8 

Less than $15 but more than $ 4.99 . . . 

434 

100.0 

Less than $15 but more than $13.99 . . . 

1 

,2 

Less than $15 but more than $10.99 . . . 

60 

13 8 

Less than $15 but more than $ 7.99 . . . 

313 

1 72.1 

1 


The method of showing dispersion by the cumulative- or 
moving-range consists in establishing a series of cumulations 
by adjusting the sizes of groups. Grouping may be begun 
from either end and carried forward step by step. The thing 
that is striven for is a summary which characterizes the com- 
plete distribution. 

But the use of the range method whether stationary or 
moving does not make it possible to compare the relative dis- 
persion of two series expressed in different units. Such a 
comparison can be made, however, by reducing the absolute 
measures to relative bases. This may be done by dividing 
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the difference between the extremes by their sum. In the 
cases used for illustration, the coefficients or ratios of dis- 
persion are as follows: 


In the historical series: 


121,852,Q0Qlbs. — 46,631,000 lbs. „ . 

121,852,000 lbs. + 46,631,000 lbs. 


In the frequency series: 


$15 — $5 j 

$15 + $5 " 


But to show dispersion, limits other than the extremes may 
be selected. The 1st and 9th deciles are often used for this 
purpose. The measure of dispersion based upon them is 
secured by taking their difference, and the coefficient obtained 
by dividing this quantity by their sum. Relative amounts of 
dispersion of the price changes in 1897 and in 1910, as shown 
in Table 52, when computed within the limits of the 1st and 
9th deciles, are as follows: 


1897: 


102-71 
102 + 71 


.18; 1910: 


187 - 86 _ . 
187 + 86 " " 


The corresponding coefficients based upon the extremes are: 


1897: 


128 - 56 
128 + 56 


= .39; 1910: 


363 -- 48 _ 
363 + 48 


.77 


The effect of choosing the 1st and 9th deciles rather than 
the extremes is to reduce the relative dispersion by approxi- 
mately one half. 

Another method of showing dispersion by the method of 
limits, but of a somewhat different type from the selection of 
the extremes or a pair of deciles, is to take the ranges covered 
by successive tenths (deciles) in a series. This is done in an 
interesting way by Mitchell in the note on page 330. 

The relation of the dispersion of one part of a statistical 
series compared with that of the whole may be determined by 
comparing the range of the middle fifty per cent of the cases 
with that of the total. For instance, the inventories as per 
cents of sales for the middle half of a group of retail clothing 
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stores fall within a range of one third of that covered by the 
entire group. That is, dispersion is much less for the part 
selected than for the entire series. By an extension of the 


Average Concentration of Price Fluctuations Abound the Median, 

1891 TO 1913 

[The fluctuations represent percentage changes from average prices in the preceding 

year.] 


Average Range Covered by the-— 


1st 
and 
10th 
tenths 
of the 
price 
fluctu- 
ations 

2d 

and 

9th 

tenths 
of the 
price 
fluctu- 
ations 

3d 
and 
8th 
tenths 
of the 
price 
fluctu- 
ations 

4th 
and 
7th 
tenths 
of the 
price 
fluctu- 
ations 

5th 
and 
6th 
tenths 
of the 
price 
fluctu- 
ations 

Successive tenths 
of the puce 
fluctuations 

Cen- 
tral 
two 
tenths 
of the 
price 
fluctu- 
ations 

Cen- 
tral 
four 
tenths 
of the 
price 
fluctu- 
ations 

Cen- 

tral 

SIX 

tenths 
of the 
pnce 
fluctu- 
ations 

Cen- 
tral 
eight 
tenths 
of the 
price 
fluctu- 
ations 

Whole 
num'" 
her 
of the 
price 
fluctu- 
ations 

69.4^ 

11.8^ 










1st tenth, 27 0 




^25.7 


^95.1 

2d tenth, 4 9 



yiB 9 

3d tenth, 2 6 


j- 7.8 

“ 1 : 

36| 

4th tenth, 2.2 
5th tenth, 1.8 
6th tenth, 1 8 
7th tenth, 2 0 

■ 

8th tenth, 3 5 


. 


9th tenth, 6.9 



10th tenth, 42.4 















“The central division of the table shows that the average range covered 
by the fluctuations diminishes rapidly as we pass from the cases of great- 
est fall toward the cases of little change, and then increases still more 
rapidly as we go onward to the cases of greatest rise. The right-hand 
group of columns shows how the range increases if we start with the two 
middle tenths, take in the two tenths just outside them, then the two 
tenths outside the latter, and so on until we have included the whole body 
of fluctuations. The left-hand group of columns, on the other hand, 
combines in succession the two tenths on the outer boundaries, then the 
two tenths immediately inside them, and so on until we get back again 
to the two central tenths. Perhaps the most striking single result 
brought out by this table is that eight tentl^s of all the fluctuations are 
concentrated within a range (25.7 per cent) 'slightly narrower than that 
covered by the single tenth that represents the heaviest declines (27 per 
cent), and much narrower than that covered by the single tenth that 
represents the greatest advances (42.4 per cent).'* 

Mitchell, Wesley C., “Index Numbers of Wholesale Prices in the 
United Stales and Foreign Countries," BuUetm of the United States 
Bnreau of Lalor Statistics, No. 173* Washington, B. 0., 1915, p. 17. 
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same method, the lower may be compared with the upper 
half; or any part with any other part. For many purposes 
such comparisons are illuminating. ^ 

When, for instance, the modal limits and the number of 
cases falling within them are given, and when the total range 
and the total number of cases are known, relative measure? 
of the dispersion within the modal group as compared witli 
that over the whole series may be computed. In the total sec-' 
tion of Table 47, the modal group of 196 cases falls at 18- 
20 per cent. That is, it covers a range of 2 per cent. The 
range of the 1088 instances is 32 per cent. Accordingly, for 
the entire series there are on the average 34 cases, and for 
the modal group 98 cases for each one per cent of change. 
The dispersion over the entire series, therefore, is approxi- 
mately three times as great as it is within the modal group 


{2) The Decile Method (Graphic) for Time Series 

The deciles may also be used to show graphically amounts 
of dispersion. Professor Mitchell has used them in two inter- 
esting ways: first, to show by years the dispersion of relative 
wholesale prices for 1890 to 1910, and second, to show by 
years the dispersion of the change in wholesale prices from 
1891 to 1918. 

In the first use,’- the prices of 145 commodities in each 
year are computed as percentages of their prices in 1890 to 
1899. That is, m each year there are 145 relative numbers 
or per cents. These are arranged in order of size each year 
and the nine deciles computed ^ The deciles and the extremes 
are shown in Table 52. The amount of dispersion may be 
calculated arithmetically or shown graphically. 


^ Mitchell, Wesley O., Business Cycles^ University of California Studies, 
Berkeley, 1913, p. 112. 

*The formulae for computing the 1st, 2nd, 7th deciles, respectively. 


n^l 2(^ + 1) 7(?i+l) 


are- . 

her of items. 


10 


In all cases n refers to the num- 


10 
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TABLE 52 

Table Showing the Deciles of Eelativb Wholesale Prices in 
THE United States, by Years — 1890-1910 

(Taken from Mitchell, W. C., Business Cycles, p. 112) 


1 

5 

Lowest 

Relative 

Price 

s s 

A 

SR S 

Pi 

Srd 

Decile 

§ 

W 53 

P 


|£| 

£ M 

H O 
«£? « 

P 

& c 

1> KJ 

P 

8th 

Decile 

H o 

05 M 

P 

Highest 

Relative 

Price 

1890 

86 

97 

101 


108 

112 

116 

119 

126 

133 

160 

1891 

74 

99 

101 


109 

111 

113 

116 

122 

132 

158 

1892 

61 

92 

99 


104 

107 

108 

111 

114 

118 

141 

1893 

70 

90 

96 


102 

104 

106 

109 

111 

119 

158’^ 

1894 

46 

79 

85 

91 

94 

96 

99 

101 

103 

111 

129 

1895 

53 

79 

86 

88 

91 

94 

95 

98 

100 

105 

149 

1896 

39 

71 

79 

85 

88 


92 

95 

98 


142 

1897 

56 

71 

78 

85 

88 

91 

93 

95 

98 

102 

128 

1898 

48 

77 

84 

87 

91 

94 

96 

99 

101 

108 

155 

1899 

46 

86 

89 

94 

97 

■qSI 

103 


112 

129 

149 

1900 

59 

mm 

I 98 

102 

106 

109 

113 

118 

123 

136 

192 

1901 

49 

iai 

1 97 

101 

104 

107 

111 

115 

120 

133 

222 

1902 

45 

91 

1 98 

102 

107 


114 

119 

IbII 

145 

194 

1903 

43 

90 

1 98 

104 

108 

111 

114 

121 

129 

143 

192 

1904 

60 

91 

1 98 

103 

106 

112 

117 

120 


143 

197 

1905 

59 


97 

104 

no 

114 

120 

126 

131 

149 

238 

1906 

62 

89 

100 

108 

114 

119 

124 

131 

137 

159 

279 

1907 

42 

95 

104 

112 

121 


132 

139 

147 

171 

304 

1908 

45 

89 

102 

107 

113 

119 

124 

130 

139 

156 

228 

1909 

48 

89 

102 

111 

117 

121 

127 

135 

146 

172 

243 

1910 

48 

86 

103 

112 

118 

124 

132 

144 

154 

187 

363 


Concerning the amount of dispersion as shown by the table, 
Mitchell says: ^Tn 1909, for example, one commodity had 
a relative price as low as 48, and another had a relative price 
as high as 243. Thus the arithmetic mean for that year, 121, 
represents relative prices which are scattered over a rah^e of 
almost 200 points. But three-fifths of the 145 commodities 
had relative prices falling within a much narrower range — 
44 points, the difference between the second and eighth dec- 
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FIGURE 64 

Curves Showing, by the Range and the Decile Methods, the 
Dispersion oe the Fluctuations in Relative Wholesale 
Prices op 145 Commodities, 1890-1910 
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iles — and one-fifth fell within limits of ten points— the dif- 
ference between the fourth and sixth deciles/’^ 

A more effective method is to use a graphic device, such 
as Figure 64, on which are plotted each year the different 
deciles and the extremes. Dispersion each year is indicated 
by the distances on the ordinates within which the respective 
measures fall. As the different decile-lines converge, disper- 
sion decreases; as they diverge, dispersion increases. A con- 
tinuous and detailed picture is given of the spread or scatter. 

The other graphic device used by Mitchell ^ in order to show 
dispersion by the decile method is reproduced in Figure 65. 
It is drawn on a logarithmic or ratio scale and 

'^shows for each year the whole range covered by the recorded 
changes from prices m the precedmg year by vertical lines, which 
connect the pomts of greatest rise with the points of greatest fall 
These lines differ considerably m length, which indicates that price 
changes cover a wider range in some years than in others. The 
heavy dots upon the vertical lines show the positions of the deciles. 
One-tenth of the commodities quoted in any given year rose above 
their prices of the year before by percentages scattered between 
the top of the line for that year and the highest of the dots. Another 
tenth fell in price by percentages scattered between the bottom of 
the line and the lowest of the dots. The fluctuations of the remain- 
ing eight-tenths of the commodities were concentrated withm the 
much narrower range between the lowest and the highest dots. The 
dots grow closer together toward the central dot, which is the me- 
dian. This concentration indicates, of course, that the number of 
commodities showing fluctuations of relatively slight extent was 
much larger than the number showing the wide fluctuations falling 
outside the highest and lowest deciles, or even between these deciles 
and the deciles next inside them. 

'^The middle dots or medians in successive years are connected by 
a heavy black line, which represents the general upward or down- 
ward drift of the whole set of fluctuations. To make this drift 

^Op, city p. 109. 

*Mitclielb Wesley C., ^Undex Numbers of Wholesale Prices in the 
United States and Foreign Countries,’’ Bulletin of the United States 
Bureau of Labor Statistics^ No. 284, Washington, B, 0., 1921, p. 15, 
and chart facing it. 
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clear the median of each year is taken as the starting point from 
which the upward or downward movements in the following year 
are measured. Hence the chart has no fixed base line. But in 
this respect it represents faithfully the figures from which it is made; 
since these figures are percentages of prices in the preceding year, 
a price fluctuation in any year establishes a new base for comput- 
ing the percentage of change in the following year. The fact that 
prices m the preceding year are the units from which all the changes 
proceed is further emphasized by connecting the nine deciles, as well 
as the points of greatest rise and fall, with the median of the year 
before by light diagonal lines. The chart suggests a series of burst- 
mg bomb shells, the bombs being represented by the median dots 
of the years before and the scattering of their fragments by the 
lines which radiate to the deciles and the points of the greatest rise 
and fall/^ ^ 

2. THE METHOD OP AVERAGING DIFFERENCES FROM A TYPE 

The measures and coeiBBcients of dispersion described above, 
while utilizing all or a part of the detail of statistical series, 
are not based upon any assumption as to the manner in which 
items are distributed about a norm or standard. No central 
term such as arithmetic mean, median, or mode is taken as a 
type from which divergence is summarized or averaged. 

Those which are now to be described are quite different. 
Deviations or differences from a central type are totaled and 
averaged, and the amount of dispersion then expressed as a 
ratio to the standard selected. This general method, of which 
there are several modifications in current use, is based upon the 
assumptions (1) that statistical series tend to be distributed 
around their averages in a definite and regular manner, and, 
therefore, that an average is the appropriate standard from 
which to measure deviations (errors) , and (2) that for such 
distributions the deviations so taken have certain mathemati- 

^ “Owing to the constant shifting of the base line, no fixed scale of 
relative prices can be shown on the margin of the chart. Because of its 
intricacy, the chart had to he reproduced on a larger scale than in the 
other cas^, but of course that fact does not alter the slant of the lines, 
and this slant is the matter of imnortance.” 
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cal properties which give the measures significance. More- 
over, for ideal or probability distributions the different mea- 
sures are related to each other by certain constants which it 
is desirable to utilize. What these are and the manner in 
which they are used will be developed as the different mea- 
sures — absolute and relative — are described.^ 

{1) The Average Deviation 

The average deviation is exactly what its name implies — 
an average of the deviations. But what are deviations? 

» Deviations from what? And what sort of an average is used 
to “average” them? For this measure, deviations are differ- 
ences from a selected standard. This may be the arithmetic 
mean, the median, or the mode of the gross items. If a dis- 
tribution is normal, these averages coincide, and it is a matter 
of indifference what name is applied to the norm taken. But 
most distributions are not of this type — ^they are non- 
symmetrical or skewed^ — so that there is a difference between 
them. If deviations are taken from the arithmetic mean, 
their alegbraic sum equals zero, but since interest is in the 
amount of the deviations and not in their signs, all devia- 
tions are counted as positive. 

But why choose the arithmetic mean rather than the median 
or the mode? One important reason for selecting the mean 
is because it is always a definite quantity while the median 
may be in doubt — ^there may be no actual quantity which di- 
vides a series into equal parts. Moreover, the mode may be 
ill-defined or there may be no mode at all. The deviations 
from the median, however, are smaller than those taken from 
any other quantity — ^that is, they are a minimum — and this is 
a desirable mathematical property of the deviations which 
it is desirable to use.® 

^ See Chapter XI, pp. 367-369. 

®See Chapter XII. 

*By the use of an analogy, Bowley has shown that the sum of the 
deviations is a minimum when calculated from the median. He says 



338 STATISTICS AND STATISTICAL METHODS 


Accordingly, mathematical consistency seems to demand 
that the median be used. But what is to be done if there is 
no true median? This is often the case in discrete series. To 
measure the deviations from a median secured by interpola- 
tion may make the sum of the deviations greater or less than 
those secured by using the arithmetic mean. While ideally i 
the median should be used, necessity often requires that the j 
deviations be computed from the arithmetic mean.^ 

But the deviations, , although taken from an average of some 
sort are themselves averaged. For this purpose, the arithmetic 
mean is customarily used. But why? Is not the median of^ 
the different deviations quite as suitable? ^ Why use an aver- 
age at all? Why not express them in some form which will 
not average out the differences but which will develop the 
typical amounts? For the latter purpose the mode might be 
chosen, or even a frequency distribution employed. But the 
mode of the differences may be quite as uncertain in amount 


(Note 3 continued) 

. . Suppose that it is required to run from a telephone exchange 
separate wires to every one of n places in a straight line, where should 
the exchange be placed, so as to use the least total amount of wire? At 
the median position. For if you move from the median position to the 
right or to the left, you will find immediately that you are adding more 
wire than you are subtracting. Supposing there are 20 stations, and 
you have a position between the 10th and 11th; if you move to a posi- 
tion between the 11th and 12th, you have to increase your distance from 
10 stations and diminish it from 9, in every case by the same length of 
the wire. The wires correspond to deviations ; and the sum of lengths 
of the wires is the sum of the lengths of the deviations. Consideration 
of this illustration will show that the sum of the deviations is a minimum 
when they are measured from the median, but that the median is not 
quite determinate, for if there are an even number of stations, the sums 
of the deviations measured from all points between the two central sta- 
tions are the same,” Bowley, A. !!»., Measurement of Groups and Series, 
liayton, London, 1903, p. 30. 

^In moderately asymmetrical distributions the difference in the aggre- 
gate in the two cases would be small ; in those which are markedly 
skewed, it may be appreciable. 

^The median of the deviations from the average, if they are all taken 
as positive, is equivalent in a normal curve of error to the “probable 
error.” For explanation of this constant, see Chapter XI, pp. 370- 
374. 
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as that of the original items.’- If precision is desired, the use 
of both a mode and a frequency distribution must be ruled 
out, and the customary method used. To take the arithmetic 
mean of the sum of the differences gives a definite quantity and 
reduces series with different frequencies to a comparable basis. 

But like the average of the original items it is an average. 
It does not give the deviations in detail, but only records a 
type. When they are uniform and small, it does this satis- 
factorily. When they are large and different, it fails here as 
it does with the gross items. Moreover, it is impossible to 
determine from the average alone which condition obtains. 

' To do so requires that they be arranged into frequency groups 
or that the method of cumulative- or moving-range be used. 
When this is necessary must be determined by the data and 
the purposes for which they are used. 

In the following examples the method of computing the 
average deviation is fully illustrated. 

a. The Average Deviation in Historical Series 

Table 53 gives the quantity of tin plates imported into 
the United States, 1906-1915, inclusive, in millions of pounds. 
By disregarding signs and combining the deviations the 
total is 502.8. The average is therefore 502.8 ~ 10 = 50.28. 
That is, the average difference of the various amounts im- 
ported from the average imported is 50.28 million pounds. 
The average itself is 86.6 million pounds. In one year the 
average is exceeded by 67.4 million pounds, while in another 
year the average imported exceeds the amount brought in 
in that year by 79.6 million pounds. The excess of the first 
is 78 per cent, and the deficit of the second 92 per cent, of 
the average. The average difference is 58 per cent of the 
average imported. 

These differences are illustrated in Table 54. 

* Normally, the differences of the items in a series from an average are 
more alike than are the items themselves. 
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TABLE 53 

Table Showing the Quantity of Tin Plates Imported into the 
United States, 1906-1915, Inclusive, in Millions op Pounds * 




[ 

Deviations 

Ybaes 

Amount 

Frequencies 

From average, 86 6 

Total (signs 




- 

+ 

Ignored) 

Total 

86.6 (av.) 

10 

251.4 

2514 

502.8 

1906 


1 


34.4 

251.4 

1907 

■9 

1 


56.4 


1908 


1 


54.4 


1909 

117 

1 


30.4 


1910 

154 

1 


67.4 


1911 

95 

1 


8.4 


1912 


1 

79 6 


251.4 

1913 


1 

58.6 



1914 

49 

1 1 

37.6 



1915 

11 

1 1 

75.6 




* Stutistical Ahatract of the United States, 1915, p. 498. 


TABLE 54 

Table Showing in Classified Form the Differences from the 
Average Importations op Tin Plates into the United 

States 

(Based on Table 53) 


Differences from the Average 
Importations (in Million Pounds) 

Years in which the Corresponding 
Dipferencbs were Found 

Total 


+ 

Total 86.6 (average) 

10 

4 

6 

Less than 15.0 

1 

— — 

1 

15 but less than 30.0 

— 

— 

— 

30 but less than 45 0 

3 

1 

2 

45 but less than 60 0 *. 

3 

1 

2 

60 but less than 75.0 

1 

...» 

1 

75 but less than 90.0. 

2 

2 

— 
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Summarizing this table, it is shown that the positive and 
the negative differences from the average range from 90 
to below 15 million pounds, six of the frequencies, when the 
deviations are taken positively, being between 30 and 60 mil- 
lion. The median difference when interpolated for is 55.4. 

The average deviation may also be computed from an as- 
sumed average. The following table using the above data 
illustrates the method: 

TABLE 55 

Table Showing the Method of Computing the Average Devia- 

, TION WHEN AN ASSUMED AVERAGE IS USED 

(Data same as in Table 53) 




Frequencies 

Devi\tions from Assumed Average ~ 90 

Year 

Amount 

- 

+ 

Total (signs 
ignoied) 

Total 

866 

10 

265 

231 

496 

1906 

121 

1 6 


31 

231 

1907 

143 

1 


53 


1908 

141 

1 


51 


1909 

117 

1 


27 


1910 

154 

1 


64 


1911 

95 

1 


5 


1912 

7 

1 4 

83 


265 

1913 

28 

1 

62 



1914 

49 

1 

41 



1915 

11 

1 

79 




The total error in deviations is 34 — ^the difference between 
265 and 231. Had the deviations been computed from the 
true average the difference would have been zero. The aver- 
age error is, therefore, 34 ~ 10, or 3.4. The deviations for six 
of the frequencies are too small — ^they were computed from 
90 in place of 86.6— and for four of them they are too large 
for the same reason. Therefore (6 X 3.4) (4 X 3.4), or 6.8, 

must be added to the combined deviations, 496, to make up 
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for the error. This gives 502.8 as the correct sum of the devia- 
tions when taken positively. The average deviation is, there- 
fore, 502 8 ~ 10, or 50.28, as in the first method above. 

There is no presumption of a normal or ideal arrangement 
in a time series. The average deviation, therefore, loses some 
of the significance associated with it in the treatment of 
natural phenomena. In the case of economic statistics it may 
be highly artificial. By its very nature the differences are 
important not only because of their size but also because of 
their distance from the center of gravity. In the example 
in Table 53, the deviation of 8.4 is as important in the divisor 
as is' that of 79.6. Each constitutes one of the ten differ-" 
ences. Of course, the median and the mode are differently 
affected.^ 

b. The Average Deviation in Frequency Series 

In the discussion of the average deviation for frequency 
series there is no necessity of restating the essential differ- 
ences between those that are discrete and those that are con- 
tinuous in type. What has already been said in this respect 
applies here. The present task is to comprehend its meaning 
and see its application to economic and business facts when 
they are grouped in frequency series. 

Various types of frequency distributions are shown in Fig- 
ure 66. Even on casual inspection, it is evident that it is 
futile to attempt to summarize them by a single expression 
such as an average. The averages may be similar, but the 
distributions about them widely different. It is the latter 
which are now being considered. Taking a somewhat different 
series, the application is seen in Table 56. Provided the signs 
are ignored, the differences amount to $50.65. The aver- 
age difference is, therefore, $50.65 -4“ 37, or $1.37. That is, the 
average difference from the arithmetic average is 32 per cent 
of the average, and varies, when weighted according to its 

^ See what is said relative to this point in Chapter IX, supra. 
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FIGURE 66 

Types op Freqtjenct Distributions 



importance, from the smallest positive difference of $.54 to 
the largest negative difference of $11.07. 

The manner in which the average deviation is computed for 
a grouped series is to assume for each group a uniform distri- 
bution of the frequencies, or what is the same thing, to assume 
that they are concentrated at the middle points, and pro- 
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TABLE 56 

TABiiE Showing the Method of Computing the Average Devia- 
tion IN a Simple Frequency Distribution 


Amount 

Frequencies 

Deviations 

From True 
Average, $4 23 

Multiplied liy the 
Frequencies 

Total 
(Mg ns 
Ignored ) 

- 

+ 

- 

-f 

Total 

37 



$25.33 * 

$25.32* 

$50.65 

12.00 

4 

$2.23 


8 92 


892 

4.00 

3 

.23 


.69 


.69 

3.00 

9 

1.23 


11.07 


1107 

6.00 

5 


$1.77 


8 85 

8 85 

3.00 

2 

1.23 


2.46 


2 46 

8 00 

3 


3.77 


11.31 

1131 

5.00 

6 


.77 


4 62 

4.62 

3 50 

3 

73 


2.19 


2.19 

4 50 

2 


27 


54 

54 


* This negligible difference is due to taking the average as $4.23 rather 
than as $4.22 + 


ceed as’ in the case above. Table 57, using a different set 
of data, is illustrative. 

The sum of the deviations is $610.60, and the average devia- 
tion $1.41. In this case, because of the concentration in the 
group $9.00 to $9.99, the average deviation is not much larger 
than the extent of this group, and is only 16 per cent of the 
average from which the deviations are computed. Moreover, 
the amount of dispersion in the frequency series in Table 57, 
relative to the average, is only one half as great as it is in the 
ungrouped series in Table 56. The clustering of the items at 
$9.00 to $9.99 shows that the average deviation is small, but 
it does not give it a numerical measure, nor does it localize it. 
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TABLE 57 

Table Showing the Method op Computing the Average Devia- 
tion PROM A Group-Frequency Series 


Amounts 

Fre* 

QUENCIES 

Deviations 

From the 
Average, $9.04 

Product of Deviations 
and Frequencies 

Total 

Deviations 

(signs 

Ignored) 

- 

+ 

- 

+ 

Total 

434 1 



$305.48 * 

$305.12* 

$610 60 

15.00 to $5.99 

15 

$3 54 


53.10 


53 10 

6 00 to 6.99 

40 

2.54 


101.60 


101.60 

7.00 to 7.99 

66 

154 


101.64 


101.64 

8.00 to 899 

91 

.54 


49.14 


49.14 

9 00 to 9 99 

113 


$.46 


51.98 

51.98 

1000 to 10.99 

49 


146 


71.54 

71.54 

11 00 to 11.99 

30 


2.46 


73.80 

73.80 

12.00 to 12 99 

27 


3.46 


93.42 

93.42 

13 00 to 13 99 

2 


4.46 


8.92 

8 92 

1400 to 14.99 

1 


5.46 


5 46 

5.46 


* This negligible difference is due to taking the average to be $9.04 
rather than $9,039+. 


If the differences are calculated from an assumed average, 
it is necessary to make a correction for the difference between 
the guessed and the true average. The manner in which this 
is done in frequency series is’ shown in Table 58, 

The total error in deviations is $200.00 — the difference be- 
tween $403.00 and $203.00. The average error is, therefore, 
$200.00 434, or $.461. But the deviations of 212 of the fre- 

quencies are too large since they were computed from $9.50 
instead of $9.04; and those of 222 are too small for the same 
reason. Therefore, the difference between 212 X $.461 and 222 
X $.461 must be added to the total frequencies — $606.00 — m 
order to get the correct total. $606.00 — (212 X $.461) + 
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(222 X $.461) = $610.60, and this divided by the number of 
instances, 434, equals $1.41, the correct average deviation. 


TABLE 58 

Table Showing the Method of Computing the Average Devia* 
TioN IN A Group-Frequency Series when an 
Assumed Average is Used 


Amounts 

Feeiquencibs 

Deviations 

Prom Assumed 
Average, $9 50 

Product of 
Deviations and 
Frequencies 

Total <*■ 
Deviations 
(signs 
Ignored) 


+ 

- 

+ 

Total 

434 



$403.00 

$203 00. 

$606 00 

15 00 to 16.99 

15 212 

$400 


lEl ^ 


60.00 

6.00 to 6.99 

40 

3 00 




120 00 

7.00 to 7.99 

66 

2.00 




132.00 

8.00 to 8.99 

91 

1.00 


9100 


91.00 

9.00 to 9.99 

113 222 






10.00 to 10.99 

49 


$1.00 


49.00 

49.00 

11.00 to 11.99 

30 


2 00 


60.00 

60.00 

12.00 to 1299 

27 


3.00 


81.00 

81.00 

13.00 to 13 99 

2 


4.00 


8.00 

8 00 

14 00 to 14 99 

1 


5 00 


5.00 

5 00 


The so-called “step-deviation^^ method, used in Chapter IX 
for computing the arithmetic mean, may be used in connec- 
tion with the average deviation. Moreover, a consideration 
to be kept in mind when the method employed in Table 56 
is used, may be explained. Suppose an average of $10.50 is 
assumed and that the average deviation is calculated for the 
above series by the “step*^ method. Table 59 shows the 
result. 

The total error in step-deviations is 634; the difference 
between 728 and 94. The average step-deviation error is, 
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therefore, 634 ~ 434 or L46. The steps are all of $1.00 width, 
so that the average step-deviation error, in terms of the unit 
of measurement, is $1.00 X 1-46 or $1.46. But the combined 
deviations, 822, are computed from $10.50 instead of $9,04, 
fche true average. Some of them are too small and some are 
too large. Which are affected and how much? The deviations 
of the frequencies above $8.50 are each too large by $1.46 on 
the average. Those at $10.50 and below are each too small by 
the same amount. Those at $9.50, 113, are each too large by 
$1.00 if $10.50 is used. But, $9.04 instead of $9.50 is the aver- 
age. Therefore, each of the 113 is too large by the difference 
between $1.00 and $.46, which is $.54.^ The total deviations 
properly corrected are 822 — (212 X $1.46) + (109 X $1.46) 
— (113 X $54) which equals $610.6. The average deviation 
is, therefore, $610.6 — 434, or $1.41. 

This seems a roundabout method of reaching a simple re- 
sult. It is, but only when the guessed average falls outside 
of the limits of the group which contains the true average. 
If it falls within this group, the method is simple and possesses 
merits for some uses. 

So much for the method of computing the average devia- 
tion in both time and frequency series. Just a word of re- 
capitulation. The average deviation is an average It does 
not necessarily reflect the peculiarities of deviations any more 

^The reason for an overlapping is shown by diagram below: 
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TABLE 59 

Table Showing the Method of Computing the Average Devia- 
tion IN A Group-Frequency Series from an Assumed Average 
BY THE “Step-Deviation^' Method 


Amounts 

Frequencies 


Deviations in 

“Steps” 


From Assumed 
Average, $10.50 

Product of 
Deviations and 
Frequencies 

Total 

(signs 

Ignored) 

- 

4- 


+ 


434 



728 

94 

822 

$5.00 to $5,99 

15 212 

5 


mm 


75 ^ 

6 00 to 6 99 

40 

4 


HI 


160 

7 00 to 7,99 

66 

3 


■|9 


198 

8.00 to 8.99 

91 

2 




182 

9.00 to 9 99 

113 113 

1 


113 


113 

10.00 to 10 99 

49 109 






11.00 to 11 99 

30 


1 


30 

30 

12 00 to 12.99 

27 


2 


54 

54 

13 00 to 13.99 

2 


3 


6 

6 

1400 to 1499 

1 


4 


4 

4 


than the arithmetic mean does of data from which it is' com- 
puted originally, except for the fact that the respective varia- 
tions from the average deviation are usually not as large as 
are the variations of the original data from their average. 
If it is large it shows relative dispersion; if it is small it shows 
relative' concentration. The exceptions are weighted in this 
case in the same way that they are in any arithmetic mean. 
If the median or modal deviations are used, then they exert 
less weight. If the cumulative-range method is used, they 
are thrown into prominence in detail. 

Average deyiations are reduced to a relative base by divid- 
ing them by the averages from which they are computed. By 
so doing they are reduced to a common denominator.^ Com- 
parisons can then be made between dispersions in different 
series. This would be impossible by the use of measures of 
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dispersion alone for series in which the averages are unequal 
and for those expressed in different units. To divide the 
average deviation by the average produces a ratio or coeffi- 
cient. 

The relative dispersion in the frequency distribution used 
as an example is .156.^ That is, it is the ratio secured by 
dividing $1.41 — ^the average amount of dispersion — by $9.04 
— ^the average from which dispersion of the items is measured 

(2) The Standard Deviation 

The standard is a modification of the average deviation. 
It is computed (1) by taking the respective deviations from 
the arithmetic average, (2) by squaring them, thus getting rid 
of the minus signs, (3) by dividing the total by the number of 
frequencies, and (4) by extracting the square root of the 
quotient. In the formula, n refers to the number of instances 
— frequencies; to the deviations squared: 2 is the Greek 

capital letter S and means ^^the process of summation.^^ In 
this case the amounts to be summated or totaled are the prod- 
ucts of the frequencies and the squares. The standard devia- 
tion is usually indicated by small sigma, cr, or S. D. The 

formula by which it is calculated is 

Squaring gives weight to extremes — ^those deviations far re- 
moved from the average. This is not fully compensated for 
in the subsequent root extraction. In frequency dist]|ibutions 
which follow the nonnal law of error, or which are moder- 
ately asymmetrical, instances far removed from ’the average 
are relatively few, so that the products of the squares and 
the frequencies at these points are due more to the squaring 
than to the multiplication. Near the average, however, fre- 

^ On the graphic method of indicating absolute and relative dispersion, 
see Clark, Earle, *‘The Horizontal Zero in Frequency Diagrams,^’ in Quar- 
terly Publications of the American Statistical Association, June, 1917, 
pp. 662-669. This article is reprinted in the writer’s Readings and Prob- 
lems in Statistical Methods, Macmillan & Company, New York, 1920, 
pp. 385-394. 


Jiw. 
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quencies are relatively numerous and the products affected by 
the concentration. In averaging the squares of the deviations, 
the frequencies, as such, exert equal weight, since the total 
is simply divided by the sum of the frequencies. 

In time or historical series the case is somewhat different. 
There is no multiplication of deviations by frequencies, since 
each item appears but once. The squaring alone is effective. 
Of course, distance from the average is still important, but 
this is neither accentuated nor minimized by the distribution 
of frequencies. Just as the sum of the deviations is a mini- 
mum — ^that is, least — when calculated from the median, so 
the sum of the squares of the deviations is a minimum when 
calculated from the arithmetic mean. This follows from the 
principle that the nearest approach to the mathematically cor- 
rect measure or observation in a series is the arithmetic mean, 
and that errors in observation are distributed about this 
center according to the rule of squares.^ 

For many economic and business purposes interest lies 
chiefly in the thing that is characteristic. Legislation is not 
generally enacted for the few, but rather for the many. Busi- 
ness policies are most frequently mapped out and changed in 
the light of that which seems to be characteristic. Sometimes, 
however, it is the exception which is suggestive, or which calls 
attention to the need for change. For instance, an exception- 
ally large sale — one far removed from the characteristic per- 
formance — ^may suggest possibilities in management and 
deserve to be emphasized both because of its stimulating 
effect on future performances on the part of salesmen,* and 
because of its suggestive power to the management as to the 
need of reorganizing the selling force. Wide dispersion of em- 
ployes^ earnings in piece-work establishments may suggest to 
a keen business management the possibilities of redistributing 
his labor service according to capacity and proved ability. 
The losses resulting from a haphazard use of labor force, when 

* See Xule, G. TJdny, Introduction to the Theory of Statistics, Griffin, 
London, 1911, pp. 134-136. 
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measured in terms of discontent, turnover of labor, etc., may 
well make it advisable to assign more importance to the ex- 
ception than that which would follow from its mere numerical 
significance. The inequalities of wealth distribution carry 
with them a significance far greater than that indicated by 
amounts alone. 

So long as it is desired to give moderate weight to large 
differences, the average deviation may be used. When 
interest shifts to that which is exceptional, means of throw- 
ing it into light are needed. Of course, in statistics of econ- 
^omics and business there is generally not the same presump- 
tion of normal distribution as there is in statistics of natural 
phenomena. Interest in deviations from type in the two cases 
IS of a different kind. Respecting the latter, deviations are 
important as showing non-conformity to an abstract standard; 
respecting the former, as means of calling attention, fdr in- 
stance, to useless waste, to unnecessary sources of industrial 
disorder, etc. Approach in the two cases may be different, 
but the means of measuring the concentration or dispersion 
is the same. To cite an average alone is frequently inadequate 
in economics, even for general purposes. But to use both an 
average and the standard deviation gives a rather definite 
idea of distribution about this figure. The latter serves more 
accurately to define the average. Moreover, average and 
standard deviations bear a more or less definite relation to 
each other in distributions which approach the normal law. 
As Yule says, 

^'It is a useful empirical rule for the student to remember that for 
symmetrical or only moderately asymmetrical distributions, ap- 
proaching the ideal forms , the mean deviation is usually very 
nearly four-fifths of the standard deviation.'" ‘ 

Again, the standard deviation bears a more or less fixed 
relation to the total frequencies. Respecting this, Yule says: 

^ Yule, G. tJdny, Introduction to the Theory of StatistieSf Griffin, Lon- 
don, 1911, p. 146 
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'It is a useful empirical rule to remember that a range of six times 
the standard deviation usually includes 99 per cent or more of all the 
observations in the case of distributions of the symmetrical or mod- 
erately asymmetrical type/'"^ 

How nearly this is true for the frequency distributions 
chosen for example is evident on inspection. 

a. The Standard Deviation in Historical or Time Series 

Using the time series of Table 53, the standard deviation is 
computed as follows, when the direct method is used: 

TABLE 60 

Table Showing the Method op Computing the Standard Devia- 
tion FOR Historical Series Using the Direct Method 
(Data same as in Table 53) 


Ybars 

Amount 

Frequen- 

cies 

Deviations 

From Average, 86 6 

Squared 

Squared, 
Multiplied 
by Fre- 
quencies 


+ 

Total 

86.6 (av.) 

10 




29,760-40 

1906 

121 

1 


34.4 

1,183.36 

1,183.36 

1907 

143 

1 


56.4 


3,180.96 

1908 

141 

1 


54.4 

2,959.36 

2,959.36 

1909 

117 

1 


30.4 

924.16 

924.16 

1910 

154 

1 


67.4 

4,542.76 

4,542.76 

1911 

95 

1 


8.4 


70 56 

1912 

7 

1 

79.6 


6,336.16 

6,336.16 

1913 

28 

1 

58.6 


3,433.96 

3,433.96 

1914 

49 

1 

37 6 


1,413.76 

1,413 76 

1915 

11 

1 

75.6 


5,715.36 

5,715.36 


The deviations squared and totaled amount to 29,760.40. 

The standard deviation is, therefore, or a/2;976 04 

or 54.5. The average deviation, 50.28, is 92.3 per cent of this 
amount. 

p. 140. 
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In Table 61, the deviations are taken from the assumed 
average, 90.0, instead of the true average, 86.6. The average 
error in the deviations is, therefore, 3.4. This must be squared, 
multiplied by the number of frequencies, and then subtracted 
from 29,876 in order to get the correct deviations squared. 
The square of 3.4 is 11.56, and when multiplied by 10 — ^the 
number of frequencies — is 115.6 The difference between this 
amount and 29,876 is 29,760.4. The square root of this 
amount, 54.5, is the standard deviation. The problem is some- 
what simplified by taking the deviations from an assumed 
average because the items to be squared are whole numbers. 

'Of course, in actual work it is unnecessary to multiply the 
deviations by the frequencies since they are all unity. It was 
done here in order that all the steps might be followed. 

TABLE 61 

Table Showing the Method of Computing the Standard Devia- 
tion FOR Historical Series Using the Direct Method but 
AN Assumed Average 

(Data same as in Table 53) 





Deviations 

Years 

Amount 

Frequen- 

cies 

From Assumed Av , 90 0 

Squared 

Squared, 
Multiph^ 
by Fre- 
quencies 




— 

+ 

Total 

86.6 (av.) 

10 




29,876 

1906 

121 

1 


31 

961 

961 

1907 

143 

1 


53 

2,809 

2,809 

1908 

141 

1 


51 

2,601 

2,601 

1909 i 

117 

1 1 


27 

729 

729 

1910 

154 

1 i 


64 

4,096 

4,096 

1911 

95 

1 


5 

25 

25 

1912 

7 

1 1 

83 


6,889 

6,889 

1913 

28 

1 I 

62 


3,844 

3,844 

1914 

49 

1 

41 


1,681 

1,681 

, 1915 

11 

1 

79 


6,241 

6,241 
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b. The Standard Deviation in Frequency Series 

The method of calculating the standard deviation is the 
same for frequency as for time series, but it may be helpful 
to carry through an example when the direct and the in- 
direct methods are employed. Taking the data in Table 58, 
and assuming the average to be $9.50 — ^the true average being 
$9.04 — ^the short-cut method is as shown in Table 62. 

The sum of the squares of the deviations from the guessed 
or assumed average is $1,424.00. But the average error is 
$.461. The square of $.461 is $.212. This amount multiplied, 
by the number of frequencies — 434 — gives $92-^, and this 
amount, when subtracted from $1424, gives $1332, as the cor- 
rect deviations squared. But since it is the average of the 
squared deviations that is desired, it is necessary to divide 

TABLE 62 

Table Showing the Method op Computing the Standard Devia- 
tion FOR Frequency Series by Using the Short-Cut Method 
AND an Assumed Average 


(Data same as in Table 58) 


Amouitts 

Frequencies 

Deviations 

From Assumed Av , $9.50 

Squared 

Squared, 
Multiplied 
by Fre- 
quencies 

... 

+ 

Total 

434 




$1,424.00 

15.00 to 1599 

15 

$400 


$16.00 

240.00 

6.00 to 6.99 

40 

3.00 


9.00 

360.00 

700to 799 

66 

2.00 


4.00 

264.00 

8.00 to 899 

91 

1.00 


1.00 

91.00 

9.00 to 9.99 

113 





10.00 to 10.99 

49 


$1.00 

1.00 

49.00 

11 00 to 11 99 

30 


2 00 

4.00 

120.00 

12.00 to 1299 

27 


3 00 

1 9.00 

243.00 

13.00 to 13.99 

2 


4 00 

16.00 

32 00 

14.00 to 14.99 

1 


5.00 

25.00 

25.00 
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this number by 434. The result is S3.07. The square root of 
$3.07, $1.75, is the standard deviation. The average devia- 
tion — $1.41 — is 81 per cent of this amount. 

The standard deviation of a series is somewhat larger than 
its average deviation. If the distribution is normal in the 
probability sense, the two measures of variability stand in 
the following relation: 

cr or S. D. = L2533 A. D., or conversely, 

A. D. = 0.7979 (X or S. D. 


Applying this formula to the example used as an illustration, 
** the relation between the average and the standard deviations 
is as 1 : 1.2413, or conversely, 0.8056 : 1. That is, the distri- 
bution approaches very nearly the normal or probability type. 

If the same distribution and a guessed average are used 
and the deviations are taken in terms of “steps, the method 
is the same, except that it is necessary to convert the steps 
into terms of the unit employed by multiplying by the size 
of the group. In this case the step is $1.00. If the widths 
of groups had been $.50, for instance, the conversion would 
have been made by multiplying the number of steps by one 
half dollar. 

If deviations from the actual average, as they appear m 
Table 57, are used, the process is the same but somewhat more 
laborious to carry through since the deviations to be squared 
are not whole numbers. Of course, in such a case it is unnec- 
essary to make a correction for errors in the deviations. They 
are correct by assumption. 

In order to convert the standard deviation into a coefficient 
— ^that is, to relieve the data of the particular unit in which 
they are expressed, and to make comparisons possible between 
two series in which absolute units are different — ^it is only 
necessary to divide by the arithmetic mean — ^the figure from 
which the deviations are computed. The coefficient of dis- 


persion for this series based on S. D., is 


$1.75 

$9.04^ 


or .194. 
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{S) The Quartile Measure 

The quartile measure of dispersion applies to that portion 
of a distribution contained between the first and third 
quartiles. The extremes below the first and beyond the 
third quartiles are ignored. It serves to characterize that por- 
tion which lies nearest the average or type. This measure, 
like the average and standard deviations, is an average. It 
is not, however, calculated from the differences of the items 
from the arithmetic mean. By taking one half of the range 
contained in the middle half of a distribution, the measure 
shows the average deviation of the quartiles from the median. 

Qg Qf 

The formula is ^ ^ , where QS and Q1 stand for the third 

Jh 

and first quartiles, respectively. The third quartile lies above 
the median; the first one below it. One half of all the fre- 
quencies lies between them. This measure is known as the 
semi inter-quartile range or quartile deviation and is fre- 
quently indicated by Q. In distributions which are symmetri- 
cal, the amounts secured by the use of the formula when added 
to the lower or subtracted from the upper quartile give the 
median. In those which are asymmetrical, such an amount 
may be greater or less than the median, depending upon the 
type of asymmetry. Because this measure, although based 
upon the method of limits, is used in connection with the 
median — a type amount — ^it is discussed here rather than in 
the section of the chapter devoted to the Method q/i Limits. 

In symmetrical or moderately asymmetrical distributions 
the relation between the quartile and the standard deviation 
measures of dispersion is fairly constant and predictable. The 
first is generally about two thirds of the second, and nine 
times’ the first usually contains about 99 per cent of the range 
covered by the entire distribution.^ How nearly this relation 
obtains in the distribution chosen as an illustration is shown 
by the following compilations: In Table 43, the median, by 

^ Yule, op. dt., p. 3,48. 



DISPERSION 


357 


interpolation, is fixed at $9,049. The first and third quartile 

positions, by the formula — ^ and - , respectively, 

are the 108%th and 326%th men. The wages of these hypo- 
thetical individuals, when interpolated for, are $7.81 and 
$10.03, respectively. The quartile range is, therefore, $10.03 

ftO 22 

— $7.81, or $2.22. The average range is or $1.11.^ For 

the same series the average deviation is $1.41, and the stand- 
ard deviation $1.75. The semi inter-quartile range, therefore, 
is equal to 79 per cent of the former and 63 per cent of the 
latter. The extreme range of $10.00 — the difference between 
$5.00 and $15.00 — is almost exactly nine times the quartile 
measure, $1.11. 

Like other measures of dispersion the semi inter-quartile 
range may be reduced to a relative basis, or made a coefficient, 
by dividing through by a common denominator. In this case, 
the appropriate divisor is the sum of the quartiles. The frac- 

tion — distance between the 

quartiles but always lies between 0 and 1. Size, therefore, is 
a test of relative dispersion. In the above example the coeffi- 
cient is dispersion is 

relatively small. It is 79 per cent of the coefficient based on 
the average deviation and 64 per cent of the coefficient based 
on the standard deviation. 

For many purposes a study of the semi inter-quartile range 
is sufficient. This may result from the nature of a distribu- 
tion or from lack of interest in the extreme cases. However, 
to cite only this measure may prejudice a case for all pur- 
poses except those which are under discussion. In order to 

i discrete series, interpolation in units less than those in which 
data are measured is illogical and aims at too great accuracy. For most 
purposes the quartiles would be given with sufficient accuracy as $7.80 
and $10.00, 
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guard against misunderstanding and to give expression to all 
of the peculiarities of a distribution, it is generally better to 
determine the average, the standard, and the quartile devia- 
tions. A comparison of these gives an accurate picture of a 
distribution. 


IV, SUMMAJRY 

Measures and coefficients of dispersion serve more accu- 
rately to describe statistical series than is possible by the use 
of averages alone. They are more refined statistical sum- 
maries, the amounts with which they have to do being the 
differences of the items one from another, or from a standard 
which is considered typical or representative. When using 
them in historical series, nothing can be implied about the 
type of distribution. In frequency series, on the other hand, 
the selection of a type from which to measure the deviations 
suggests some natural or normal order of distribution. More- 
over, the relations between the constants for normal curves 
establish a standard by which those found in individual cases 
may be judged or appraised. But what does the use of these 
constants imply? What are meant by such expressions as 
the “normal law of error curve, “a normal distribution”? 
Briefly to answer these questions is the subject of the fol- 
lowing chapter. 
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CHAPTER XI 


THE THEORY OF PROBABILITY AND SOME 
PROPERTIES OF THE NORMAL LAW OF 
ERROR DISTRIBUTION ^ 

I. Outline of the Theory of Probability 

In the measurements of natural and physical phenomena one 
is struck both by the similarities and the differences in dif- 
ferent members of a class, or in repeated measurements of the 
same class. While results vary, they fall within clearly defined 
limits. In the absence of bias on the part of the one mak- 
ing the measurements, of changes in the unit, of the accuracy 
at which he aims; of the nature of the thing measured and of 
the unit in which the results are stated, there tends to be a com- 
mon or typical measurement from which others deviate above 
and below in a more or less regular and systematic manner. 

To illustrate: If the heights of a large number of a 
homogeneous class of men — say soldiers ^ — are measured to the 

discussion of only the simplest phases of these subjects is suitable 
to an introductory text on statistical methods. The theory of probability 
belongs in the realm of mathematics as do also the more serious discus- 
sions of the properties of the normal law of error distribution. Both 
subjects are fully treated in the following among other books: Fisher, 
Arne, The Theory/ of Prohahilit'y, Macmillan & Co., New York, 1922; 
Keynes, J. M., A Treatise on Prohahility, Macmillan & Company, Lon- 
don, 1921 ; less complete discussions are found in Pearl, Raymond, Intro^ 
duction to Medical Biometry and Statistics^ Saunders, Philadelphia, 1923, 
Chap. XI; Jones, D. C., A First Course in Statistics, Bell & Sons, Lon- 
don, 1921, Chaps. XII, XIII, and XVIII; Jevons, W. Stanley, Principles 
of Science, Macmillan & Company, London, 2nd Edition, 1920, Chap. X 

* See Yule, G. U., An Introduction to the Theory of Statistics, GrifBn, 
London, 1911, Chap. VI for frequency graphs of measurement of heights 
of 1078 “English sons”: 1,000 Cambridge Students; weight of 7,749 
adult males in the British Isles, etc. 
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nearest quarter of an inch, differences will be found. Some 
men who may be termed '^tall,” by any reasonable standard, 
will be encountered. Similarly, some who are ^^short” will be 
found. The measurements, however, will cluster at or around 
a certain height which may be called modal. If, on the other 
hand, a non-homogeneous group of people — such for instance 
as that found at a Fair on a given day — ^were measured in 
the same way, the distribution of the results would be differ- 
ent. Those who are ^^short’’ for one class would be '^talF^ for 
another. Moreover, there is no necessary basis for expecting 
jthe heights definitely to cluster at a certain typical or modal 
measurement and shade off gradually above and below. A 
distribution of such an aggregate would probably have two 
modes. The same thing would be true of the sales of sales- 
men in 5 and 10 cent stores and of those in the furniture sec- 
tions of department stores. Why? Because in this and the 
foregoing example the phenomena are non-homogeneous. 

Moreover, if the measurements of a homogeneous soldier 
group were made by several individuals with different stand- 
ards of accuracy, affected by personal bias, or with non-uni- 
form imits of measurement, the results would not cluster 
about a type, and shade off systematically above and below. 
Why? Because the conditions of measurement are not uni- 
formly applied. 

To take another illustration. If the weights of a sufficiently 
homogeneous '^population” of hogs at the Chicago stockyards 
were taken at a given time, the measurements being free from 
bias affecting the unit of measurement, the standard of ac- 
curacy and the sensitiveness of the scales, the weights would 
cluster about a norm or typical amount. If, on the other 
hand, they were taken over a period of time, during which 
methods of breeding, fattening, shipping, etc., made the re- 
ceipts non-homogeneous, then no such type of distribution 
could be expected. Why? Because time has introduced an 
element of non-homogeneity or bias. 

If, rather than measuring different members of a class a 



362 STATISTICS AND STATISTICAL METHODS 


number of times, a single example is subjected to many mea- 
surements, then, in the absence of bias affecting the purpose, 
intent, and prejudice of the one making the measurement, or 
the unit which is employed for this purpose, the normal type 
of distribution or a close approximation to it would result. 
Since by hypothesis accuracy is aimed at and non-homo- 
geneous conditions — bias in every form — are removed, a typi- 
cal or characteristic result would be secured. From this, 
however, there would be both negative and positive deviations 
since absolute uniformity is not to be expected. But these 
would be fewer in number than those which are termed char- 
acteristic or most common. 

Similar illustrations drawn from other fields might be cited 
at length, but they would not add materially to the point 
which is being developed. 

Let us approach the subject from a different angle. If a 
coin is tossed it may fall either heads or tails. It must fall 
either heads or tails; it cannot fall both in a single trial. If 
there is no reason why it should fall one way in preference to 
the other, it is said that the chances are even that the results 
will be heads or tails. If it is unevenly balanced, the head side 
being more heavily weighted, it will probably more frequently 
fall tails. That is, bias — in the coin itself — controls the re- 
sults. If it is evenly balanced, but cleverly thrown, heads 
may markedly exceed tails. Bias in this case is personal. 
Chance — ^the name for that multitude of influences by which 
a given event is determined but all of which are supposed to 
operate without hindrance or bias — is interfered with. 

Again, if cards are not evenly cut, equally smooth and of 
the same color and size, any one may be selected at will from 
a pack. If they are uniform in every particular, the chance 
of selecting a certain one is no greater than that of selecting 
any other one. If there are 52 in a pack, the chance of select- 
ing the king of spades is 1 out of 52. Some card is selected, 
and since there are 52 possibilities, the chance of getting any 
one is the same as that of getting any other. Again, since 
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there are 13 diamonds, the chance of selecting some one 
diamond is 13 out of 52, or 1 in 4. But a diamond might not 
be selected if four trials were made, after each of which the 
card taken out is returned and the pack thoroughly shufSed. 
One might not be secured even if eight trials under the same 
conditions were made. But such a result would be very un- 
likely. On the average^ with repeated drawings, one diamond 
out of each of four trials would tend to be selected. That is, 
the ^^probability” is i/4 that such a result would be secured. 
In the long run this would tend to be true. 

Let us return to the illustrations of tossing coins. Suppose 
one coin is tossed a number of times in succession (an analo- 
gous case to measuring the same phenomenon a number of 
times) . What is the probability of getting a certain number 
of heads and a certain number of tails? 

In one toss, we may get either heads or tails. The chances, 
we say, are equal that one or the other result will be secured. 
Let the possible results be indicated as follows — ^H meaning 
heads, and T, tails: 

H, T 

In tossing the coin twice there can be four possible results. 
We can get 

HH, HT, TH, TT 

That is, a head in the second may follow a head in the first; 
a tail in the second, a head in the first; a head in the second, 
a tail in the first; and a tail in the second, a tail in the first. 

In three tossings, there are eight possible results, because 
to the four events previously possible, the H and T of the third 
coin may be combined. These may be set down — ^using the 
same methods as above — as follows: 

HHH, HHT, HTH, HTT, THH, THT, TTH, TTT 

Similarly, with form tossings. In this case there are 16 possible 
events. 
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HHHH 


HHHT 
HHTH 
HT HH 
THHH 


HHTT 
T HHT 
TTHH 
HTTH 
HTHT 
THTH 


HTTT 
THTT 
T T HT 
TTTH 


TTTT 


If in place of writing the H’s and T’s separately we write as 
an exponent the number of times H and T appear in each 
combination, we get 

H* + 4 H*T + 6 H^T* + 4 H T® + -p, or 
1 4.4 +6 +4 +1 

This is the number of ways in which the five combinations 
can appear by tossing one coin four times. If fom coins were 
thrown once (this is an analogous case to measuring each of 
four things once) the result would be as follows — ^the different 
coins being designated as (a), (b), (c), (d): 


(»)(!)) (c)(d) (a)(b)(c)(d) (»)(b)(c)(d) <a)(b)((s)(d) (a)(b)(c)(d) 


HHHH 


HHHT 
HHT H 
HTHH 
THHH 


HHTT 
T HHT 
TTHH 
HTTH 
HTHT 
THTH 


HTTT 

THTT 

TTHT 

TTTH 


TTTT 


If each of these combinations is given an index notation, 
the result is the same as that secured when 1 coin is tossed 4 
times, viz: 

H* + 4 H®T + 6 + 4 H P + T*, or 

1 4.4 +6 +4 +1 


Now this expression gives the same result as is obtained by 
raising the binomial (H + T) to the fourth power. If it 
were raised to the fifth power the corresponding number of 
cases would be 32, made up of 1 5 + 10 + 10 + 5 + 1, 
each of the H’s and T’s in the preceding example being 
combined with another H and T, thus producing twice as 
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many possible results. If it were raised to the 8th power, 
the number of possible events would be 1 + 8 + 28 + 56 + 70 

66 4“ 28 4“ 8 “I” 

From the ^'arithmetical triangle” ^ the number of times each 
combination may appear may be read off directly.^ 

It will be noted that each line of the "triangle” produces a 
series which regularly increases and then decreases, reaching 
a maximum at the center and shading off above and below.^ 
This is the probability distribution approached in the measure- 
ment of natural and physical phenomena. 

An illustration from Jevons at this place is of interest. 

'^Suppose, for the sake of argument, that all persons were naturally 
of the equal stature of five feet, but enjoyed during youth seven inde- 
pendent chances of growing one inch in addition. Of these seven 
chances, one, two, three, or more, may happen favorably to any indi- 


* The Arithmetical Triangle 


3 First Column 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


Second Column 


Third Column 
1 Fourth Column 


1 Fifth Column 


4 

10 

20 

35 

56 

84 

120 


1 

5 

15 

35 

70 

126 

210 


Sixth Column 

1 Seventh Column 


6 

21 

56 

126 

252 


1 

7 

28 

84 

210 


Eighth Column 
1 Ninth Column 


8 

36 

120 


Tenth Column 
1 Eleventh Column 
10| 1 


®See Jevons, W. Stanley, Frmctples of Science (2nd Edition), Mac- 
millan & Company, London (Reprint 1920). ‘Tn general language, if I 
wish to know in how many ways m things can be selected in combina- 
tions out of n things, I must look in the n + 1^^ line, and take the 
m 4" 1*^ number, as the answer. In how many ways, for instance, can 
a sub-committee of five be chosen out of a committee of nine. The 
answer is 126, and is the sixth number in the tenth line; it will be 

found equal to ^ ^ ^ | /bid., p. 187. 

*In alternate series above that for the 3rd power the two middle 
items are the same. See Jevons, op. cit, pp. 185-186, for certain other 
properties of the “Arithmetical Triangle.*’ 
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vidual; but, as it does not matter what the chances are, so that the 
inch IS gained, the question really turns upon the number of combina- 
tions of 0, 1, 2 , 3, etc., things out of seven. Hence the eighth line of 
the triangle gives us a complete answer to the question. . . . There 
are altogether 128 ways m which seven causes can be present or 
absent. Now, twenty-one of these combmations give an addition of 
two inches, so that the probability of a person under the circum- 

21 

stances being five feet two inches is — . The probability of five 


35 7 1 

feet three inches is of five feet one inch of five feet - 7 ^, 

12o liio Uo 


and so on. Thus the eighth line of the Arithmetical Triangle gives 
all the probabilities arising out of the combinations of seven causes.’' 


The theoretical number of times different combinations of 
heads and tails would be secured if ten coins were tossed is 
shown in Table 63. 

TABLE 63 

The Theoretical Distribution Secured by Tossing Ten Coins 


(The 11th hne of the Arithmetical Triangle) 


Character of Throw 

Theoretical Numbers * 

10 

Heads 

0 Tails 

1 

9 

a 

1 

tt 

10 

8 

t( 

2 

tt 

45 

7 

u 

3 

tt 

120 

6 

it 

4 

tt 

210 

5 

tt 

5 

tt 

252 

4 

i( 

6 

tt 

210 

3 

a 

7 

tt 

120 

2 

tt 

8 

tt 

45 

1 

tt 

9 

it 

10 

0 

tt 

10 

tt 

1 

Total 

1,024 


♦ See the 11th line of the Arithmetical Triangle. 


Now upon the comparative number of combinations, as 
shown in the arithmetical triangle, as Jevons says, is founded 
* Op. dtf pp. 188, 202. 
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the theory of error ^ to which appeal is made in quantitative 
investigations.^ The greater the number of times a group of 
coins is tossed, or the greater the number of coins which are 
tossed once, the nearer does the distribution of the results 
actually secured tend to agree with the theoretical distribution 
as given by the expansion of the binomial (H + T) . Similar- 
ly, with perfect random selection the greater the number of 
natural phenomena of a homogeneous type which is meas- 
ured once, as well as the greater the number of times a single 
phenomenon is measured, the nearer do the results secured 
.agree with those which would characterize the entire ^^popu- 
lation.^^ Upon the assumption that chance in the first in- 
stance and perfect random selection in the second produce the 
distribution in the arithmetical triangle, a theory of error is 
built up. 

^ 11. Properties of the Normal Law of Error Distribution 

If the theoretical results of tossing ten coins, as shown in 
Table 63, are plotted with the frequencies measured on the 
Y axis, and the nature of combination on the X axis we get 
Figure 67. 

The ends of the ordinates are joined together by a smoothed 
line which would be approached if the exponent of the power 
of the binomial were increased — say to 999.^ Figure 67 illus- 
trates the so-called normal probability curve or normal law 
of error distribution to which reference has been made at 
various times. The shape of the curve is different for different 
exponents of the expansion of the bionomial. The lower the 
exponent, the more “peaked’^ the curve; the higher, the flatter 

^The term ‘‘error” is used in the sense that if a number of observa- 
tions are taken, the deviation or difference of any one of them from their 
mean is an “error.” 

^O'p. cit,, pp. 188-189. 

® For a fi^re in vs^hich the separate ordinates give essentially a smooth 
curve, see Slichter. Charles S., Elementary Mckthematical Analysis, 2nd 
Edition, McOraTv-Hill Book Co., New York, 1918, p. 212. 
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FIGURE 67 

Graphical Represbnt4tion of the Theoretical Distribution 
Secured by Tossing Ten Coins 



it appears. In all cases, however, the curves are alike in that 
they are symmetrical about a maximum — excesses and defects 
being equal — and shade off in a systematic and regular manner. 
Accordingly, such figures have certain mathematical proper- 
ties of which the following are the most important: 

\/ 1. The curve is uni-modaL 

2. All of the instances are included beneath the curve and 

above the X axis. 

3. Half of the instances are included on either side of the 

mean. 

4. The arithmetic mean, median, and mode coincide — ^they 

are identical. 

5. The standard deviation, S.D., cuts the curve at the points 

of inflection. Within a distance of one standard de- 
viation, S,D., above and below the mean 68 per cent of 
the instances fall 
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6. Within a range of 2/3, or more exactly .6745, of the 

standard! deviation, S.D., when measured plus and 
minus from the mean, one half of all of the instances 
occur. This is the ^^probable error” — an expression 
which means that the chances are even that a measure 
(error or deviation from the mean) will fall within 
this interval. 

7. The average deviation, A.D.^ is four fifths — or more 

exactly .7979 — of the standard deviation, S.D, 

8. The semi inter-quartile range, — - 2 equal to the 

probable error, P.E . — ^that is, a distance above and 
below the mean within which one half of the instances 
' fall 

9. The semi inter-quartile range, — ^ when added to 

the lower quartile or when subtracted from the upper 
quartile is equivalent to the mean, median, and the 
mode, and equal to 2/3 of the standard deviation, S.D. 

10. The probable error, P.S., is .845, of the average deviation, 
A,D, 

For the series showing the theoretical result of throwing 
ten coins, the arithmetic mean, median, and mode are 5; the 
standard deviation, S.D, or a-, is 1.58 and the probable error, 
1.07. That is, it is an even chance that an item selected 
purely at random will fall within 5.00 zh 1.07 or between 3.93 
and 6.07. The width of the shaded portion on Figure 68 
shows the limits defined by the probable error, its area being 
one half of the total area under the curve. 

Not only may the gross items of chance series or the meas- 
urement of phenomena taken at random be plotted in the 
form of a probability curve, but the different means (aver- 
ages) of a number of such chance series or measurements may 
also be indicated in this manner. The means of different 
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measurements like the measurements themselves will vary.‘ 
If these were plotted as a frequency distribution, the form of 
graph would approach the normal type. Such a series would 
have a mean, a standard deviation, a probable error, etc., 
between which the relations may be expressed by a series of 
constants, in the same manner as for the gross items. For 


instance, the probable 


error of the mean is .6745 


S.D. 
Vn ’ 


S.D. = .6745 


8.JX 

V2n 


FIGURE 68 

The Area of the Normal Curve, Inside (blank.), and Outside 
(shaded), the Limits Set by One Times the Probable Error 



Lower Upper 

Quartile Quartile 

1 X P. E. 1 X P. Ek 


ITT The Meaning of the Probable Error Concept 


The “significance” of individual measures and their means 

is measured in terms of their probable errors. fThe probable 

j error is define d as a mea sure which added to an <i aiiht.ra.cted 
_ . 


I from the mean „ 

I are'eveSThaT anTtem selected at random will fall^flt is said 


* Pearl gives an interesting example of the variation of means taken 
from a series of random selections. Pearl, Raymond, Introduction to 
Medical Biometry and Btatiatics, Saunders, Philadelphia, 1923, pp. 210- 
213. 
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conventionally that if a certain result is three or more times 
as large as its probable error it is “significant.” What is 
meant by this expression? The following illustrations taken 
from Pearl ^ will help to answer this question. 

In Figure 68, the blank portion under the curve represents 
one half of the area. Accordingly, its boundaries on the X 
axis mark the limits of the probable error. 

In Figure 69, the corresponding blank portion representing 
twice the probable error comprehends 82.27 per cent of the 
area. The shaded portion includes 17.73 per cent of the area. 
Therefore, the odds are 82.27 to 17.73 or 4.64 to 1 that an item 
selected at random will fall within twice the probable error. 

FIGURE 69 

The Area of the Normal Curve, iNsron (blank), and Outbidb 
(shade®), the Limits Set bt Twice the Probable Error 



In Figure 70, the blank area is three times the probable 
error. It comprehends 95.70, while the shaded portions make 
up but 4.30 per cent of the total area. Therefore, the chances 
are 95.70 to 4.30 or 22.24 to 1, that an item taken at random 
will fall within three times its probable error. Similarly, the 
chances are 142.3 to 1 that an item will not exceed four times 

' Pearl, Raymond, Introductton to Medical Biometty a/ad StaUtftiat^ 
Saunders, Philadelphia, 1923, pp. 215-216. 
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its probable error. In this case the part of the total area of a 
probability surface falling outside of the limits of four times 
the probable error is less than 1 per cent — .698 per cent.^ 

FIGURE 70 

The Area op the Normal Curve, Inside (blank), and Outside 
(shaded), the Limits Set by Three Times the Probable Error 



3 X P. E. 3 X P. E. 

To say that a measurement is ^^significant” when it is three 
or more times as large as its probable error is, therefore, 
equivalent to saying that the odds against its appearance — 
once in 22.24 times when three times the probable error is 
taken as a test — ^may be ignored. But as Pearl remarks: “As 
a matter of fact, this is not true, unless one chooses to regard 
4.3 per cent as a negligible fraction of a quantity.” ^ 

The “odds” given above refer to the probable error of a 
single measure. Those for means, and standard deviations are 
different as indicated by the formulae on p. 370. The prob- 
able error of a correlation coefiScient is discussed later.^ 

IV. Sample Measurements and the Uses of the Probable 

Error 

Statistical studies are almost always made from samples. 
All prices cannot be included in computing an index number 

^See Pearl, op, cU., p. 218, for a table giving- the “odds’’ for other 
relationships between a measurement and its probable error. 

*Op. Git,, pp. 214-215. 
pp. 464-465. 
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nor all rents determined when istudying family budgets. 
Neither the time required for all operators within manufactur- 
ing industries to complete an operation, nor the time necessary 
for every operator in telephone industries to answer the tele- 
phone calls of all subscribers, can be determined in order to 
answer specific inquiries. Samples must be used and some 
method employed for testing their reliability. Averages alone 
will not suffice; their limitations in describing frequency dis- 
tributions have already been indicated. The most common 
measure of divergence from type is the standard deviation. 
But it is simply a measure for the samples taken. What is 
"wanted is proof that the distribution in the samples indicates 
the distribution that would result if the whole ^^population^^ 
were included. The probable error supplies this. On the sup- 
position that if all the population were included a distribution 
would follow the normal curve of error, the probable error 
stands in a mathematical relation to the standard deviation 
in the same way that the radius of a circle does to the cir- 
cumference. Hence, the reliability of a sample may be ex- 
pressed in terms of its probable error. 

Breeders of animals and plants find it necessary to deter- 
mine the probable error of their measurements in studies of 
variation from type.^ Moreover, in the selection of men ac- 
cording to psychological and other tests,^ in the grading of 
cotton and grains, in the setting of tasks, and the establish- 
ment of piece-rates of compensation on the basis of the 
“average” operator’s performance, some measure of the re- 
liability of the samples must be employed. Again, according 
to Fisher,® the only scientific method of establishing the pure 
premimn for industrial accident insurance is to compare homo- 

^ Davenport, Eugene, The Principles of Breedinpy Oinn and Co., New 
York, 1907, passim, 

^Whipple, Guy M., Manual of Mental Physical TestSy Warwick 
and York, Baltimore, Md., 1914, passim, 

^ Cf., Fisher, Arne, Proceedings of the Casualtpy Aotuarialy and Bta- 
listical Society of Americay Vol. II, Part III, No. 6, May, 1916. 
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geneous conditions of risk exposure and to test the homogeneity 
by measures of dispersion in terms of their probable errors. 
Conformity to the normal law is proof that conditions are 
homogeneous. Most comparisons, it is held, involve non-homo- 
geneous conditions. The proper unit is not the ^^establish- 
ment, but similar risk conditions in many establishments or 
industries. 

It must be remembered that the probable error is a con- 
stant only for distributions of the normal probability form. 
It has no meaning for those which are markedly asymmetrical. 

V. Summary 

The theory of probability and the properties of the normal 
law of error lie at the basis of most of the statistical studies 
of natural and physical phenomena. They have less applica- 
tion to problems growing out of human affairs where ^^chance” 
does not freely operate, and where measurements are not sub- 
jected to the law of error. Indeed, measurements of economic 
and business phenomena do not necessarily follow the prob- 
ability form. They are generally asymmetrical or skewed. It 
is to the measurement of asymmetry or skewness to which 
we turn in the following chapter 
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CHAPTER XII 


SKEWNESS OR ASYMMETRY 
I. Inteodttction 

The preceding chapter was concerned with an elenaentary 
statement of the theory of probability and with the character- 
istics of the normal law of error curve or distribution which 
is expressive of this theory. While that which is probable 
must find its basis in experience, experience is finite and 
limited. Even the most protracted experiments of tossing 
coins, selecting cards from a pack, throwing dice, measuring 
the heights of soldiers, or the lengths of ears of corn have 
not succeeded in duplicating the probability curve which 
logic and belief prompt us to expect. All trials are limited 
in the sense that the entire “population” is not included and 
that time is not exhausted. Even though by repeated trials 
of coin tossing, for example, series secured by the expansion 
of the binomial were actually duplicated, such a result might 
be looked upon rather as an exception, the probability being 
almost certain that it would never be repeated. 

The statistician deals with samples. His measurements are 
secured not under circumstances of pure chance, but under 
those peculiar to time, place, and particular environment. 
Accordingly, the series which he selects do not exactly con- 
form to the probability curve. 

An analogy at this place is in point. Perfect circles exist 
only in imagination. So also do the precise relations of their 
diameters and circumferences. Yet mathematicians are not 
debarred from drawing circles nor from using the constant, 
n or 3.1416. So, likewise, pure probability distributions are 

37fi 
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a creation of the imagination. Yet acknowledging this to be 
true, statisticians are not prevented from determining the 
degree to which distributions deviate from this ideal, nor from 
using the concept of probable error. 

In the run of experience, statistical distributions are skewed 
or unsymmetrical. The purpose of this chapter is to describe 
the more important ways by which asymmetry or skewness 
may be measured. 

II. Dispersion and Skewness Contrasted 

Measures and coeAhcients of dispersion, respectively, in- 
dicate absolutely and relatively the differences of the separate 
items in series from one taken as a standard. They measure 
deviations from type, varying emphasis being given to the 
differences depending upon the particular device used. The 
average deviation gives all of the differences their normal 
weight; the standard deviation accentuates those far removed 
from type. The quartile measure includes only those lying 
within the boundaries of the first and third quartiles. They 
do not, however, show the manner in which the deviations 
are distributed, nor do they localize them. They do not 
show the degree to which they cluster above or below the 
type selected. 

Measures of skewness, on the other hand, indicate the posi- 
tion relative to the mode or median at which distributions are 
pulled away, distorted, or skewed from normality, i.e, from 
the symmetrical form of the curve of error. In the normal 
curve, mode, median, and arithmetic mean coincide. In un- 
symmetrical curves they differ in size. The function of meas- 
ures of skewness is twofold: (1) to indicate the direction of 
skew or asymmetry, and (2) to measure the amount either 
absolutely or relatively. 

Most, if not all distributions, are skewed to some degree,^ 

Tolley, Howard R., “Frequency Curves of Climatic Phenomena,” 
in Monthly Weather Review, United States Department of Agriculture, 
Vol, 44, November, 1916, pp. 634-642, 636. 
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the normal distribution being in fact abnormal/' in the sense 
that it is never realized. Indeed, it is probably true that 
nature never repeats herself, although it is said that history 
does. Asymmetry in a particular case may be due among 
other things to imperfect measurements, inadequate sampling, 
personal bias, etc. In the universe at large, however, it prob- 
ably rests upon more fundamental bases rooted in the fact 
of variation and diversity. But asymmetry takes a variety 
of forms — some marked, some slight — and it may be worth 
while briefly to illustrate certain of its types.^ 

III. Types of Skewed Disteibutions 

An ideal symmetrical frequency distribution is shown in 
Figure 71. 

FIGURE 71 

The Form op the Ideal Symmetrical Frequency Distribution 



*More elaborate illustrations are given in Yule, G. IT., An Introduc- 
tion to the Theory of StatistiaSf Griffin, London, 1911, Chap. VI. 
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Two ideal distributions of the moderately asymmetrical 
type are shown in Figure 72. 

A distribution approaching the moderately asymmetrical 
form is given in Table 29. Each of the curves in Figure 72 
approaches the normal type — bell shaped — but neither is sym- 
metrical A mode is evident in each case but the items are 
not uniformly distributed about it — ^that is, distribution is 
skewed. 


FIGURE 72 

The Forms of Ideal Moderately Asymmetrical or Skewed 
Distributions 



On the other hand, in Figure 73 the distribution is of quite 
a different kind ^ — ^the peculiar shape being primarily due to 
the fact that non-homogeneous groups in the attribute meas- 
ured are grouped together. 

Still another general type is encountered. Figure 74 shows 
an ideal J-shaped distribution, while four series approaching 
this form are given in the footnotes on pages 382 and 383. 

^See Yule, op. p. 103, for an iUustration of the ideal U-shaped 
form. 
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FIGUEE 73 

U-SHAPED DiSTEIBUTION CuRVE OF DeATHS PER 1,000 POPULATION AT 
Specipibd Age Periods, United States Eegistration States, 1920^ 



Other illustrations might be given, but these will suffice for 
our purposes. The customary measures and coefficients of 
skewness are applied to curves following the general type of 
those in Figure 72 — ^that is, where a mode is present, the dis- 
tribution of items around it tending to be regular and sys- 
tematic but where there ia not a perfect balance on either 
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FIGURE 74 

The Form of the Ideal J-shapm) Frequenct DiSTWBxmoN Ctjrve 



side. It is this type of curve with which we are concerned 
in the following section. 

IV. Meastjbes and Coefficients of Skewness 

The chief and currently used measure of skewness is the 
difference between the arithmetic mean and the mode. If 
the mean exceeds the mode, skewness is said to be positive. 
If it is less than the mode, it is said to be negative. The mode, 
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of course, is unaffected by extreme items whether large or 
small The arithmetic mean, on the other hand, is influenced 
not only by the size but also by the number of items. If dis- 
tributions are normal, that is, if the ^^errors” in excess and in 
defect of the mean are equal in number and in extent of de- 
viation, those which are positive cancel those which are nega- 
tive, and the mean has the same position as the mode. If 
they are unsymmetrical, then the arithmetic mean may be 
greater or less than the mode depending upon the position 
of asymmetry. Accordingly, the difference between them may 
be used as a measure of skewness. The sign (+) or (— ) 
secured by the computation, mean — mode, indicates the di- 
rection of skewness; the difference, indicates its amount. 

Inasmuch as the mode as an average is not rigidly defined, 
its amount in a particular case may be in doubt. Interpola- 


The following examples show distributions which are clearly asym- 
metrical : 


Illustration 1 

Number of Divorces in the U. S., 
1887 to 1906, Classified by Num- 
ber of Years of Married Life. 

{XJ, JS. Statistical Adstraet, 1913, 
p. 85.) 


No. OF Years Married 

No. OF 
Divorces 

Total 

900,584 

Under 5 

255,085 

5 to 9 

282,904 

10 to 14 

162,407 

15 to 19 

91,176 

20 to 24 

54,578 

25 to 29 

29,245 

30 to 34 

15,035 

35 to 39 

6,555 

40 to 44 

2,507 

45 to 49 

805 

50 and over 

287 


Illustration 2 

Table Showing Number of Indi- 
viduals and Corporations As- 
sessed for Income Tax in 12 
Wisconsin Counties, classified by 
amount groups of Assessed In- 
comes. 

(Rept. Wis, Taos Oommissiont 
1912, p. 37.) 


Total 

11,935 

Under $1000 

7,890 

$1000 to $1999 

1,910 

2000 to 2999 

786 

3000 to 3999 

406 

4000 to 4999 

234 

5000 to 9999 

411^ 

10,000 and over 

298* 


♦ Notice the widths of the groups. 
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tion is then necessary. Various methods by which this may be 
done have already been suggested,^ but each of them is more 
or less arbitrary. Different methods may give different 
amounts. But the above formula for skewness requires an 
exact mode — ^it cannot be used when the mode is given simply 
as a group or as falling within certain limits. A purely em- 
pirical interpolation formula for the mode, for moderately 
asymmetrical series, is as follows: 

Mode = mean — 3 (mean — median) 

That is, the median lies about one third of the distance from 
the mean toward the mode. This formula, however, should 


Note continued from page 382 
Illustration S 

Table showing Distribution of Per- 
centages of Cost of Collection to- 
Total Collections, Internal Rev- 
enue of the U. S., 67 Districts, 
1913. (Compiled from the Re- 
port of the Commissioner of In- 
ternal Revemie, 1913, p. 211.) 


Percentage Groups 

No. OP 
Districts 
(F requency) 

Totau 

67 

0 to 2 

29 

2 to 4 

24 

4 to 6 

4 

6 to 8 

4 

8 to 10 

4 

10 to 12 

0 

12 to 14 

1 

14 to 16 

1 


Illustration 4 

Number of Weavers weaving 
Worsted Goods in the U. S. and 
Receiving Specified Wage-rates 
Based upon Actual Weaving 
Time on Yardage at Regular 
Piece-rates per Yard, Including 
Ordinary Stoppage of Loom. 
(Report of Tariff Board on 
Schedule K—Yol IV, p. 1007.) 


Earnings per Hour 

Number 

Total 

3182 

10 to 12 

165 

12 to 14 

275 

14 to 16 

375 

16 to 18 

490 

18 to 20 

490 

20 to 22 

438 

22 to 24 

414 

24 to 26 

235 

26 to 28 

150 

28 to 30 

108 

30 to 32 

34 

32 to 34 

4 

34 or over 

4 


See Chapter IX, pp. 297-307, 
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be used with caution. It does not hold for markedly asym- 
metrical distributions because of the effect which exceptionally 
small or large items have on the mean. The fact of skewness 
may be determined by rough methods — even by inspection in 
most cases — but a measurement of the degree of skewness 
by this method necessitates the location of an exact mode. 

But more than a measure of skewness is required if series 
in this respect are to be compared. Differences between means 
and modes, as the amounts themselves, are always expressed 
m the unit in which series are measured. These may be feet, 
inches, gallons, dollars, cents, or what not. It is meaningless, 
therefore, to say that because the difference between the 
mean and mode in one senes expressed in dollars, for in- 
stance, is larger than the difference in another series expressed 
in cents, feet, or inches, that skewness or asymmetry is 
greater. Some method of reducing the amounts to a common 
denominator must be used before comparison is possible. It 
is asymmetry which is being compared; not the units in which 
the measurements are made. What common denominator is 
most suitable? 

Skewness is divergence from symmetry, and symmetry is 
uniform dispersion with respect to the mean. Standard and 
average deviations for series which are widely dispersed are 
large ; for those which are narrowly dispersed, they are small. 
The most satisfactory measure of dispersion being the stand- 
ard deviation, or S,D,, this may be used as a divisor in order 
to reduce to the same denomination amounts of skewness. 
Accordingly, the coefficient of skewness based on the positions 
of the mean and the mode is 

Mean — Mode 
S,D. 

The measurement of skewness is always indicated by a 
plus (-f) or minus (—) sign prefixed to an amount, the unit 
being the same as that in which a series is measured. The 
coefficient of skewness is always indicated by a decimal pre- 
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fixed by a plus (+) or minus (■— ) sign, the units in the nu- 


merator and in the denominator in the formula, 


mean— mode 
_ , 


being the same. When the mean and the mode coincide, both 
the measure and the coefiicient are zero. 

But the position, measure, and coeifficient of skewness may 
be secured for a part rather than for the whole of a series. 
The conventional method is to measure the portion lying be- 
tween the first and the third quartiles. If a series is sym- 
metrical for this half, the quartiles are equally distant from 
the median. That is, one half of the difference between them 
when added to the lower or subtracted from the upper quartile 
gives the median amount. Accordingly, the nature and 
amount of skewness, within the quartile range, is indicated by 
the formula 


(Q" + Q') - 2 {Median) 

If the quartiles are equally distant from the median, this 
formula gives zero. If the distance from the median to the 
upper quartile exceeds that from the lower quartile to the 
median, the formula gives a positive quantity. If the reverse 
is true, it gives a negative amount. The 'position of skewness 
— ^that is, relative to the median but applying only to the 
middle half of a series — is indicated by the nature of the sign. 
The amount of skewness is shown by the quantity accompany- 
ing the sign. 

But this measure like that based upon the mean and the 
mode must be stated as a ratio before comparisons can be 
made between series measured in different units. An appro- 
priate common denominator is Q® — The formula for the 
coefficient of skewness based on the quartiles is, therefore, as 
follows: 


(Q* + Q") - 2 {Median) 
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the result being written with a plus (+) or minus (— ) sign 
as a prefix. 

One half the total frequencies are included between and 
Q^. In a symmetrical distribution, Q® and are equally dis- 
tant from the median. In an asymmetrical distribution, this 
is not the case. If for this part of the series skewness is posi- 
tive, the third quartile is farther removed from the median 
than is the first quartile. If skewness is negative, the reverse 
is true. 

The quartile type of skewness measure may also be applied 
to the halves of series above and below the median. If it is 
applied to the lower half, the formula is 

Smallest item + Median — 2 (Q^) 

Median — Smallest item 

If it is applied to the upper half, the corresponding formula is 

Largest item + Median — 2(Q^) 

Largest item — Median 

A positive or negative measurement or coefficient of skewness 
of a series shows that it is not normal. In the measure and 
coefficient based on the mean and the mode, asymmetry is 
localized relative to the mode. In those based on the quartiles, 
it is indicated relative to the median. But the median and 
the mode are identical in normal distributions. In those which 
are skewed, the mode is least and the arithmetic mean most 
affected by asymmetry. The median holds an intermediate 
position. 

V. Methods of Sxjmmaeizing Frequency Series 

The three primary methods of summarizing frequency series 
are (1) to average the gross items using the arithmetic mean, 
median, mode, or other suitable measure; (2) to summarize 
by the method of averages or otherwise the deviations (errors) 
of the items from a standard or type — ^that is, to calculate 
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measures and coefficients of dispersion; (3) to determine the 
nature and amount, if any, of skewness, that is, departure 
from the symmetry of the normal probability distribution. 

An adequate description of a statistical series requires not 
alone one of these summaries but all of them. Each of them 
tells a different story. If the averages of gross items closely 
agree, the normal law of error distribution is approached; if 
dispersion is small, the measures tend to be homogeneous. If 
skewness is present and negative large deviations are found 
below the mean ; if it is present and positive, such deviations 
are above the mean. 

Measures and coefficients of both dispersion and skewness 
should be in everyday use in statistical work. For two or ; 
more series arithmetic means may be identical, but dispersion 
and skewness different. These facts are important. Current 
comparisons of sales, wages, interest rates, stock and bond 
prices, etc., by means of such measures could not fail to throw 
new light on the problems of business 

Without carrying through the arithmetical steps in the 
computation of different summaries — since this would involve 
unnecessary repetition of the methods already given — ^their 
use may be illustrated by comparing wage data for a single 
occupation in eighteen identical establishments, reported by 
the United States Bureau of Labor Statistics.^ 

Table 64 gives the classified wage data and the summaries 
computed from them. Figures 75 and 76 show graphically the 
detail of the series and the positions at which the different 
averages fall. 

What are some of the things which these summary figures 
show? 

1. The arithmetic mean exceeds both the median and the 
mode 2 in each year. Skewness is’, therefore, positive. 

^Bulletin of the United States Bureau of Labor StatisUos, Whole 
Number 190, May, 1916, p. 139. 

*A single mode is indeterminate in 1908 and 1910. 
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TABLE 64 

Table Showing Classified Wage-Rates op Female Menders in 
Eighteen Identical Woolen and Worsted Manufacturing 
Establishments, by Years, Together with Certain 
Measures of Dispersion* and Skewness* 


Classified Wage-Rates of Female Menders, 
BY Yiurs 


Wage Groups — Cents per Hour 


1908 

1909 

1910 

Total 

403 

341 

583 

498 

6 to 8 

— 


3 

1 

fSto 9 

2 


44 

14 

1 9 to 10 

27 

22 

91 

44 

10 to 12 

68 

71 

117 

125 

12 to 14 

119 

61 

82 

81 

14 to 16 

81 

57 

86 

58 

16 to 18 

37 

39 

49 

30 

18 to 20 

34 

35 

42 

82 

1 20 to 25 

31 

35 

58 

43 

25 to 30 

4 

10 

11 

16 

1 30 to 40 




4 

f 40 and over 





Arithmetic Mean 

14.56^S 

15.01/f 

13.96^ 

14.97^ 

Mode (by interpolation) 

13 08^ 


■ mSm 


First Quartile 

12.07^ 

11.48^ 



Median (Second Quartile) 

13 76^ 

14.22^ 

B 


Third Quartile 

16.32^ 

17.77^ 

B 


Dispersion : 





Average Deviation 

2.86^ 

3.54(^ 

3.754 


Standard Deviation 

3.67^ 

4.47^ 

4.58^'* 

4.96^S 

Coefficient on A. D 

.196 

.236 

.269 

.287 

Coefficient on S. D 

252 

.298 

328 

.331 

Skewness: 





Arithmetic Mean — ^Mode .. . 

+ 1.48^ 

** 

+ 3 . 01 ^ 

** 

Quartile Measure 

+ .87^ 

-|-.81^ 

+ 2.57<! 

+ 2.334 

Coefficient on S. D 

+ .40 




Coefficient on Quartile 

+ .21 

+ .13 

BH 

+ .31 


* Computed. r Notice residuum, 

t Notice size of group. Indeterminate 
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FIGURE 75 

CnBVBS Showing, by Years, Classified Wage-Rates of Female 
Menders in Woolen and Worsted Establishments, 1907-1908 


Per Cent 



2. Both the average and the standard deviations, as well 
as the coeflScients of dispersion based on them, tend to in- 
crease from year to year. That is, the average differences 
in rates when measured from the arithmetic mean tend to 
be larger both absolutely and relatively. 

3. The lower quartile position in 1907 is essentially as 
high as the median in 1909. The range of difference in rates 
between the median and the upper quartile is more than double 
in 1910 what it is in 1907. 

4. In both 1909 and 1910 there is a much more pronounced 
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FIGURE 76 

Curves Showing, bt Years, Classieied Wage-Rates op Femam 
Menders in Woolen and Worsted Establishments, 1909-1910 

Per Cent 



skew between the medians and the upper quartiles than m 
1907 and 1908, the coefficients on the quartile measures being, 
respectively, + .21, + .13, + + .-SI* 

5. The wage-rates which the middle-half received varied 
as follows: 

1907, from 12 07 to 16 32 or 4 25f 

1908, from 11.48 to 17.77 or 6.29f 

1909, from 10 14 to 16 61 or 6.47^. 

1910, from 11.05 to 18.52 or 7.47^ 

That is, the position of the lower quartile, with one excep- 
tion, has fallen, and that of the upper quartile, with one ex- 
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ception, risen. While the average rate in 1910 is less than one 
half cent higher than in 1907, the wage of the person three 
fourths up in the scale is more than two cents higher. 

6. The coeflBcient of dispersion based on the average devi- 
ation, and the coefficient of skewness based on the quartile 
measure are higher in 1909 and 1910 than in any other of the 
years. Negative skewness indicates a healthy influence in wage 
conditions — a concentration above the arithmetic mean. On 
the other hand, the wide absolute and relative dispersions tend 
to counteract this. 

Other detailed facts may be gleaned from a comparison of 
these summaries, but those given are sufficient to show how 
they may be used. 


VI. Conclusion 

It is generally not enough to speak in terms of averages 
when characterizing statistical series. Deviations both as to 
amount and position are frequently quite as important as the 
averages themselves. After all, any sort of summary sacri- 
fices part of the detail; but the sacrifice is less when different 
types are used to supplement each other than when reliance 
is placed in one alone.^ 
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CHAPTER XIII 


THE THEORY AND MEASUREMENT OF 
CORRELATION 

I. Introduction 

Any body of data or any statistical series may be analyzed 
descriptively by giving the details in tabular or in graphic 
form. If summaries are appropriate, averages of different 
types may be taken of the gross items and the relations of 
one to the other indicated. With these statistical abbrevia- 
tions as paints of departure, the deviations of the gross items 
may be further summarized by the use of averages. That is, 
dispersion in its absolute and relative aspects may be com- 
puted. But since dispersion indicates^either symmetry nor 
divergence from normal, measures and coefiicienls*of skewness 
are 

^TTTEwb or more bodies of data or statistical series are to be 
compared, any one or all of these devices may be used. \Tabu- 
lar and graphic forms give the detail; averages, when ex- 
pressed in a common unit, admit of direct comparison. For 
instance, a statement such as the following is significant: The 
average expense of doing business in retail meat stores is 19 
per cent; in retail clothing stores, 24 per cent of sales. On 
the other hand, statements such as these have no comparative 
meaning: the average rent expense is 2 per cent of sales in 
retail meat stores; in the same type of stores the average 
number of times stock is turned is once in two days. Both 
amounts are averages, but pot of the same things. Hence, 
comparatively, they are meaningless. 

Moreover, the amounts of dispemion in two series cannot 

39a 
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be compared by means of their standard deviations unless the 
averages from which they are computed are identical. If they 
are different, ratios are required, the respective averages con- 
stituting a common denominator with which to reduce the 
absolute amounts to a relative basis. The same type of ob- 
servation applies to measures of skewness. It is impossible to 
compare degrees of asymmetry in two or more series by saying 
that in one skewness is + 7 and in another -j- 4. The 7 and 
4 have comparative significance only in case the standard 
deviations are identical. If they are not the same, the re- 
spective standard deviations as divisors reduce them to the 
same denomination. Comparison is then possible. 

Now i n all statistical work comparison of one sort or an- 
other is the goal. In some cases what is wanted are compara- 
tive pictures of a single series as shown by different measures 
of its attributes,^ In others, it is a comparative picture of 
different series by the use of the same measures of their prop- 
erties.^ 

But not infrequently one desires to compare for two or more 
series the corresponding deviations from their respective aver- 
age^. That is, interest lies in getting a statistical measure of 
congruence of change in the deviations. In this case, pairs 
of values are dealt with, the purpose being to measure the 
manner and degree in which they concurrently fluctuate or 
deviate from a norm or standard. A ratio of some sort which 
will summarize the relations which they bear to each other is 
needed. 


II. Comparison, Causation, and Correlation 

Comparison can be made only between things possessing 
common qualities. Thes e may be of time, oTpIaceT*^^ 
ditioTL IbrinstmceTTffe^accide^ given industry 

may be compared before and after the installation of safety 


^ See the different summaries in any one of the columns in Table 64. 

^ See the corresponding summaries in the different columns in Table 64. 
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devices. Moreover, comparisons may extend to two industries 
operating at different places or under different conditions, the 
purpose being merely to record a quantitative difference. But 
they are rarely made for this end alone. Generally, a more 
or less definite purpose of establishing a causal connection 
lies in the background. A specific inquiry is undertaken to 
determine whether phenomena stand in the relation of cause 
and effect, or whether they are the result of a common cause. 

To establish cause and effect relations between economic 
and social phenomena, however, is as alluring as it is difficult 
Such phenomena grow out of the facts of business, the obser- 
vations of science, the records of history, etc., and are inter- 
preted differently by different people, at different times, and 
for different conditions. Their seeming unity and identity 
are only relative, and the order of cause and effect not hard 
and fast. 

Variations at a given time and changes over a period of 
time, characteristic of our economic and social life, are all 
traceable to a complex of causes.^ A given cause is not 
homogeneous except when viewed in the most superficial man- 
ner. Moreover, its ^^effects’^ are not always the same; they 
vary. In some cases ^^cause and effect^^ seem to be coincident 
in time; in others the “effects” follow the “causes” as sequences 
spread over long or short periods. Indeed, what appears to be 
a “cause” may be an “effect” of an antecedent “cause.” In 
the physical, natural, and social world, “cause and effect” are 
in reality variates.^ How true this is may be seen by briefly 
referring to some of the more common relations among busi- 
ness phenomena. 

Stimulation of business shows itself in an increase of bank 
debits, but not all banks are equally affected. Interest rates 
ultimately respond but not uniformly in different markets. 
Excessive issues of irredeemable paper currency ultimately 

^ See tlie definition of Statistics, supra^ p. 10 ff. 

^ Gf. Hooker, R. H., “Correlation of the Marriage Rate with Trade,’' 
Journal of the Hoyol 8t$,tutical Society , Vol, 64, p. 485. 
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result in a premium on gold and in a general increase in 
prices, but not concurrently with the issue nor to the same 
degree for different types of business transactions. The sur- 
plus reserves of banks are said currently to fix the call-loan 
interest rate. But not all loans, nor all banks nor customers 
are affected at the same time and to the same degree. Whole- 
sale and retail prices fluctuate together, but the former fall 
first and rise first, the latter following some distance behind. 
The effect of cotton prices on acreage is shown only from one 
cropping to another, and then not uniformly over the cotton 
area. Wages undoubtedly tend to rise with rising prices, but 
not coincidently, nor to the same degree in all trades. Busi- 
ness prosperity undoubtedly stimulates immigration but only 
after a period of time. The relation is sequential. Moreover, 
general prosperity is far from uniform for areas, for industries, 
and for classes.^ 

Comparison, therefore, involves pairing things or event s 
which are not identical in all particulars as to time, p lace. 
an3""’con3iti(om in fact beco mes correlatio n. A 

stu^TiFcause a nd effect, whether of coincidence or sef|imnnej 
beraEesTaiieiy a study of associatiop . The idea that a 
given effect is the result of a specific cause, or that the effect 
must in the nature of the case be uniform and absolute, does 
not apply to business and economic phenomena. Causes never 
operate under exactly the same circumstances. Oneness of 
effect is only apparent, variation being evident the moment 
that the scale of measurement is reduced.^ 

Business does not go on indefinitely repeating itself in one 
unending round of sameness. Variation characterizes all phe- 
nomena which involve the human element, whether viewed as 

* See King, W. I., Employment Hours and Eairnings in Prosperity and 
Depression^ United StateSf 1920-1922^ The National Bureau of Economic 
Research, New York, 1923, passim, 

*When making comparison in economics or business, there is a tend- 
ency to attempt to safeguard oneself agfinst error and criticism by 
introducing the proviso — other things being equal. But the ‘^other 
things*^ are rarely if ever equal in actual life. 
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cause or as effect. The tendency to look upon business and 
economic phenomena in a mechanistic manner, to expect a 
complete and narrow fulfilment of the law of cause and effect, 
needs to be dispelled. Just as soon as it is, the way is open 
for the use of scientific method. This is the method of dis- 
crimination, of the study of small differences, of acting in 
the light of facts properly interpreted, and of reducing them 
as classified knowledge into rules of action. 

The conclusion to which facts point may be nothing more, 
for instance, than that it is unwise to market corn with high 
moisture content, since weight varies inversely with moisture,^ 
or to leave corn in leaky cars exposed to hot weather because 
both are conducive to the development of acidity, and acidity 
retards germination;^ that a '^bacon^' hog can be produced; 
that corn grown from seed from ears 10 inches long has, on 
the average, longer ears than corn grown from seed of ears 
that are 8 inches long;® that the prices of bonds with 
fixed interest rates vary inversely with general commodity 
price changes;^ that a farm of less than 40 acres in a 
certain district is economically undesirable;^ that the nfilk 
production of cows increases until the animals are at least six 
years of age and then falls off ; ® that there is a direct relation 


^Bulletin of the United States Department of Agriculture, No. 472, 
October, 1916, “Improved Apparatus for Determining the Test Weight 
of Graip, with a Standard Method of Making the Test.’^ See curve on 
p. 4. 

* Bulletin of the United States Department of Agriculture, No. 102, 
July, 1914, on “Acidity as a Factor in Determining the Degree of 
Soundness of Corn,” pp. 12, 14, passim, 

®“Type and Variability in Corn,^^ Bulletin No. 119, University of 
Illinois Agricultural Experiment Station, October, 1907. 

^ Mitchell, Wesley C., Business Gpcles, University of California 
Studies, Berkeley, 1913, pp. 201-219, especially charts 23 and 24, pp. 206 
ahd'207, respectively. 

’^Bulletin of the United States Department of Agriculture, No. 341, 
January, 1916, on “Farm Management Practice of Chester County, Pa.,” 
pp, 56 ft. 

^Holdaway, 0, W., “Statistical Weighting for Age of Advanced Eegis- 
try Cows,” The American NaturaMstf VoL 50, No. 559, p. 681. 
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between fatigue and industrial accidents ; ^ that accident rates 
tend to increase with expanding and to contract with falling 
business; 2 that twin offspring from twin parents in sheep 
production is more common than from parentage conforming 
to any other condition; ^ etc. Whatever they are and to what- 
ever type of business they apply, if they are arrived at as a 
result of a dispassionate study of facts in an attempt to deter- 
mine association and correlation and not to prove the infalli- 
bility of some narrow cause-and-effect relationship, a clear 
advance is made in the use of statistical methods. 


III. The Meaning op Correlation 
1. DEFINITION and EXPLANATION 

If it is impossible in social affairs to establish causation in a 
narrow sense, since causes operate as variations and effects 
show themselves in the same way, it is unnecessary to con- 
clude that cause-and-effect relations in a larger sense cannot 
be measured. The problems are different. The first is the 
impossible task of establishing an absolute cause and an ab- 
solute effect; the latter is the problem of measuring correla- 
tion. Pearson makes the distinction clear in the following 
passage: 

^^Wh^n we vary the cause, the phenomenon changes, but not always 
to the same extent; it changes, but has variation in its change. The 
the variation in that change, the more nearly the cause defines 

^“The Case of the Shorter Day,” Franklin O. Bunting vs. The State 
of Oregon, Brief for the Defendant in Error, by Felix Frankfurter, Vol. 
I, pp. 165-193. 

* Mowbray, A. H., and Black, S. B., ^melation of Accident Frequency 
to Business Activity,” in Proceedings of the Casualty, Actuarial and 
Statistical Society of America, Vol. II, Pt. Ill, No. 6, May, 1916, pp. 
418-426. 

® Rietz, H. L., and Roberts, Elmer, “Degree of Resemblance of Parents 
and Offspring with Respect to Birth of Twins for Registered Shropshire 
Sheep,” in Journal of Agricultural Research, Vol. IV, No. 6, September, 
1915. 
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the phenomena, the more closely we assert the association or the 
correlation to be. It is this conception of correlation between two 
occurrences, embracing all relationships from absolute independence 
to complete dependence, which is the wider category by which we. 
have to replace the old idea of causation. Everything in the universe 
occurs but once, there is no complete sameness of repetition Indi- 
vidual phenomena can only be classified, and our problem turns on 
how far a group or class of like, but not absolutely same, things which 
we term 'causes' will be accompanied or followed by another group 
or class of like, but not absolutely same, things which we term 
'effects.' "" 

What correlation, as thus distinguished from causation, 
means is indicated by Davenport as follows: 

"The whole subject of correlation refers to that interrelation be- 
tween separate characters by which they tend, in some degree, at 
least, to move together. This relation is expressed in the form of a 
ratio. Thus, if an mcrease of one character is always followed by a 
corresponding and proportional increase in a related character, the^ 
correlation is said to be perfect and the ratio is 1 On the other 
hand, if an increase in one character is followed by a correspoiidingr 
and proportional decrease in a related character, the correlation,. is 
said to be negative and the ratio is — 1, or perfect negative correla- 
tion. Still again, if the characters in question are absolutely indif- 
ferent the one to the other, the correlation is said to be zero, indi- 
cating mere association under the law of independent probability, 
without causative relation of any kind." 

Probability, as briefly described in Chapter XI, was said to 
supply a basis for the theory of error. Under conditions of 
pure chance, frequency measurements describe the normal law 
of error curve. The basis for expecting such curves is found 
in games of chance such as coin tossing, selection of balls 
from an urn, etc. In spite of the fact that such distributions 
are ideal and probably never realized in actual experience, 
they are the basis for much of our statistical reasoning. 

Back of the theory of error and of normal distributions rests 

^Pearson, Karl, The Grammar of Science, 3d Edition, Black, London, 
1911, p. 15T. 

* Davenport, Eugene, Principles of Breedinp, Ginn & Company, New 
York, 1907, p. 453. 
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the assumption that chance freely operates — that is, that every 
condition is the result of a multitude of causes, all operating 
to produce an effect, but independent of each other. Accord- 
ingly, “causes” and “effects” are characterized by variation. 

2. ILLUSTRA.TIONS OF COERELATION BY THKOWS OF DICE 

Darbishire,^ by throwing 12 dice 1000 times and counting 
the number at each throw which had four or more spots upper- 
most,2 secured the results shown in Table 65. 

TABLE 65 

Table Showing the Distribution of Dice with Four or More 
Spots Uppermost in 1000 Throws 


Result op Throw 

Frbquehct 

Result op Throw 

Frequency 

0 

0 

7 

179 

1 

3 

8 

129 

2 

15 

9 

64 

3 

55 

10 

11 

4 

no 

11 

2 

5 

208 

12 

1 

6 

223 




That is, chance operating freely produced a distribution 
closely approaching the normal type. The significant thing, 
however, is that it is not perfectly normal. If another set of 
1000 trials of the same kind were made, a similar approxima- 
tion to normal distribution would be secured. The probability 
iff almost certain, however, that the results in the second case 

^ Barbishire, A. B., “Some Tables for Illustrating Statistical Correia- 
tion,” in Memoirs and Proceedings of the Manchester Literarg and 
Philosophical Society ^ Vol. 51, No. 16, 1907. This is in continuation of 
a similar study made by Weldon, W. F. R. — “Inheritance in Animals 
and Plants,’’ pp. 81-100, in Lectures on the Method of Science^ edited 
by T. B. Strong, Oxford, 1906. 

*The probability that any side of a perfect cube if thrown will come 
up is equal to that of any other side. The probability that a certain 
side will come up is %. 
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would not be exactly the same as those in the first.^ That 
is, the “causes” give varying “effects.” 

Successive throws, after each of which all dice are returned 
to the receptacle and thrown again, are entirely distinct. 
There is no connecting link between them which makes them 
stand in the relation of cause and effect. The different sets 
of trials and each throw in each trial are independent. 

If two such trials of 600 throws each are tabulated so that 
the result in each first throw is paired with that in each second 
throw, the detail of Table 66 is secured. This is a double 
frequency table, provision being made in the stub for record- 
ing the results in the first throws, and in the caption, for the 
results in the second throws. 

TABLE 66 

Table Givinq the Results op 500 Pairs op Throws of 12 Dice When 
All Those Thrown the First Time Were Thrown the Second 

Time* 


Second Throws 




0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


Total 


1 

9 

24 

57 

112 

101 

94 


31 

m 

~2 

T 

0 

i 














1 

2 

— 

— 

— 

1 

— 

1 

A 

— 

— 

1 

1 

— 

— 

— 

2 

3 

D 

31 

— 





1 

1 

4 

“I 

7 

8 

5 

4 

1 

1 

1 



4 

52 

— 

— 

4 

4 

7 

9 

6 

12 

5 

5 

— 

— 

— 

First 5 

95 

— 

— 

3 

5 

13 

26 

14 

14 

12 

6 

1 

1 

— 

Throws 6 

123 

— 

— 

1 

6 

15 

25 

24 

28 

15 

6 

2 

1 

— 

7 

87 

— 

— 

1 

5 

7 

16 

22 

15 

13 

6 

1 

— 

1 

8 

66 



— 



1 

7 

15 

19 

12 

6 

6 

— 

— 

— 

9 

33 

— 

1 

— , 

1 

2 

9 

7 

6 

6 

— 

1 

— 

— 

10 

5 

— 

— 

— 

— 

2 

— 

1 

2 

— 

— 

— 

— 


11 


— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

12 
















♦The order of the units on the ordinate scale is reversed in this 
instance from that usually followed. 


An inspection of the table shows little or no connection be- 
tween the results secured in the first and in the second 
* Of* Weldon, W. F. R., op, cit,, for the results of three trials. 
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throws of each trial. For each of the precise results in the 
first throws there is a variety of results in the second. Simi- 
larly, for each of the precise results in the second throws there 
is a variety of results in the first. For instance, when there 
are 7 dice in the first throws with 4 or more spots upper- 
most, there are from 2 to 12 with 4 or more in the second 
trials. Dispersion is equally noticeable in the opposite direc- 
tion. When 8 are secured in the second trials, the correspond- 
ing numbers in the first throws vary from 1 to 9. 

The totals for the first as well as for the second throws give 
close approximations to the normal curve. The most probable 
number of dice showing 4 or more spots uppermost in a throw 
of twelve is six, but the number may be anything between 
zero and 12. The concentration at or near six in the totals 
and in the arrays — distributions in lines and colunms — shows 
this to be true. 


TABLE 67 

Table Giving the Results of 500 Connected Throws of 12 Dice, in 
Each Second Throw op Which 3 Dice Were Left Down and 

Counted * 




Second Thkows 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


Total 


2 

7 

31 

55 

82 

111 


71 

25 

7 

1 

— 

0 

ir 














1 

2 

— 

— 

— 

— 

1 

— 

1 

— 

— 

— 

— 

— 

— 

2 

7 

— 

— 

— 

— 

— 

— 

6 

1 

— 

— 

— 

— 

— 

3 

20 

— 

1 

1 

5 

2 

2 

4 

5 

— 

— 

— 

— 

— 

4 

64 



— 

1 

8 

m 

21 

16 

6 

6 

— 

— 

— 

— 

First 5 

92 

— 

— 

4 

3 

12 

15 


22 

9 

3 

1 

— 

— 

Throws 6 

123 

— 

1 

— 

m 

m 

17 

^^3 

28 

22 

5 

1 

— 

— 

7 

97 



— 

1 

4 

9 

17 

18 

24 

16 

5 

3 

— 

— 

8 

54 



— 



1 

5 

6 

10 

14 

8 

7 

2 

1 

— 

9 

30 







— 

4 

31 

9 

6 

6 

2 

— 

— 

— 

10 

10 

— 

— 

— 

— 

— 

1 

1 

1 

4 

3 

— 

— 

— 

11 

1 

— 

— 

— 

— 

— 

— 

— 

1 

— 

— 

— 

— 

— 

12 


— 

— 

— • 





— 







* See note to Table 66. 
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Independent forces in a universe of chance gave the results 
in Table 66. But the ^^chance” distribution in the first throws 
may be made to determine (cause) those in second throws. 

Such '^causation^^ was accomplished by Darbishire as fol- 
lows: In order to connect or relate the two throws of each 
pair, he repeated the experiment, first leaving down and count- 
ing in the second throw of each pair one, then two, then three, 
etc., of the dice which previously had been stained red so as 
to distinguish them from the others. The experiment was con- 
tinued until all of the 12 dice thrown in the first, were left 
down for the second throws. The results when 3, 5, and 10 
dice were left down are given in Tables 67, 68, and 69, respec- 
tively. A graphic picture of the dispersion of the throws is 
shown in Figure 77. 

In each pair of trial throws, in which one or more of the dice 
is left on the board and counted in the second throw, there 
is a common element. That is, the first is in part a cause of 

TABLE 68 

Table Giving the Results of 500 Connbcteid Throws of 12 Dice, in 

Each Second Throw op Which 5 Dice Were Left Down and 
Counted * 



m 

Second Throws 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

m 

11 

12 





11 

n 

54 

93 

112 

118 

m 

21 

9 

2 

— 

0 

1 














1 

2 

— 

— 

— 

— 

1 

1 

— 

— 

— 

— 

— 

— 

— 

2 

11 





3 

1 

5 

1 

1 

— 

— 

— 

— 

— 

— 

3 

26 

— 

— 

3 

3 

8 

4 

4 

4 

— 

— 

— 

— 

— 

4 

69 





3 

6 

9 

21 

14 


5 

1 

— 

— 

— 

First 5 

83 

— 

— 

— 

4 

11 

23 

21 

15 

9 

— 

— 

— 

— 

Throws 6 

109 





1 

3 

9 

18 

27 

29 

16 

3 

2 

1 

— 

7 

95 





1 

2 

5 

14 

24 

28 

10 

7 

4 

— 

— 

8 

* 63 






1 

5 

9 

10 

18 

14 

4 

2 

— 

— 

9 

31 





— 

— 

— 

2 

9 

13 

4 

3 

— 

— 

— 

10 

10 

— 

— 

— 

— 

1 

— 

2 

1 

2 

3 

1 

1 

— 

11 

12 

1 


— 

•— 

— 

— 

— 

— 


— 

— 

— 

— 

— 


* See note to Table 66. 
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the second, exerting an influence in proportion to its size. 
But the distributions in none of the cases, if the trials were 
repeated, would necessarily follow the order here given. 
Causes never operate at different times under exactly the 
same conditions, and the effects that follow from them are not 
always and necessarily the same. To duplicate the conditions 
under which causes operate will not necessarily duplicate the 
effects. '^Duplication^^ after all in any way except as approxi- 
mation is impossible in actual life. 

How nearly economic and business phenomena remain homo- 
geneous for any appreciable period, even in an approximate 
sense, is always doubtful. The forces affecting them are 
always in a state of flux governed as they are by population 
composition, state of trade, distribution of wealth, custom, 
fad, fashion, prejudice, etc. The whole range of human re- 
action is exhibited in more or less degree. Statistics under 

TABLE 

Table Giving the Results of 500 Connected Throws of 12 Dice, in 
THE Second Throws op Which 10 Dice Were Left Down and 

Counted * 




Second Throws 



m 

1 

2 

3 

4 

6 

6 

7 

8 

9 

m 

11 

12 


Total-~> 

1 

2 

7 

24 

55 

93 

111 

mm 

64 

31 

11 

1 



4- 














0 

1 

1 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

1 

l> 

1 

*T 

— 

1 

Q 

K 

— 

— 

— 

— 

— 

— 

— 

— 

— 

z 

24 



1 

3 

0 

8 

9 

3 




— 





— 


First 4 

55 

— 

— 

2 

fEl 

18 

19 

6 

— 

— 

— 

— 

— 

— 

Throws 5 

110 

— 

— 

— 

1 

24 

43 

32 

10 

— 

— 

— 

— 

— 

6 

93 

— 

— 

— 

— 

4 

22 

37 

24 

6 

— 

— 


— 

7 

96 

— 

— 

— 

— 

— 

6 

27 

39 

19 

5 

— 

— 

— 

S 

60 

— 

— 

— 

— 

— 

— 

9 

17 

24 

9 

1 

— , 

— 

9 

42 

— 

— 

— 

— 

— 

— 

— 


14 

11 

7 

— 

— 

10 

10 

— 

— 

— 

— 

— 

— 

— 

— 

1 

6 

2 

1 


11 

1 

— 

— 


— 

— 

— 

— 

— 

— 

— 

1 

- — 


12 


— 

— 


— 

— 

— 

— 


— 

— 



•• IT 


* See note to Table 66. 
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such circumstances often reveal a partial story, are not com- 
parable from time to time and from place to place, and taken 
alone constitute a weak and uncertain base upon which to 
establish cause-and-effect relations. 

FIGURE 77 

Graphic Figures Illustrating Correlation by Means op 500 
Pairs of Throws op Dice 

Second TJarows Independent of First Second Throws Dependent on First 
Throws. Throws — 5 Bice in Ooimnon. 


Throws Thi ow 3 



2nd 2nd 2nd 2nd 
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IV. The Measurement of Correlation 

Correlation and narrow causation are different. Whether 
phenomena stand in the relation of “cause and* effect’^ and if 
so which is cause and which is effect can never be determined 
statistically.^ Correlation, or association between them may, 
however, be determined in this manner. It is quite as possible 
in two or more series to measure the congruence of change 
of the corresponding items from a norm or standard as it is to 
describe in a single series the manner in which the deviations 
are distributed about a mean. Indeed for both, much the 
same type of reasoning applies. 

1. THE “sum PRODUCT^^ METHOD 

In order to understand the measure of correlation most com- 
monly used in statistical analysis, it is necessary very briefly 
to describe the conditions under which it was developed. 

(1) The Assumptions Upon Which the Pearsonian Coefficient 
of Correlation is Based 

What has come to be known as the Pearsonian coeflEicient 
of correlation was conceived by Sir Francis Galton in connec- 
tion with his work on heredity. In the form in which it is 
now used it is the creation of Karl Pearson, the English bio- 
metrician and statistician. It has since become the tool of 
biometricians, 2 zoologists,^ breeders,^ psychologists,® and econ- 

Hooker, “Correlation of the Marriage Rate with Trade.” Journal 
of the Royal Statistical Society, Yol. 64, p. 485. 

* See the journal Biometriha and the writings of Sir Francis Galton, 
Karl Pearson, C. B. Davenport, H. M. Vernon, et al. 

® Among the leading is Harris, .1. A., of the Carnegie Institution of 
Washington, D. O. See his “An Outline of Current Progress in the 
Theory of Correlation and Contingency,” in American Naturalist, Jan- 
uary, 1916, Vol. L, pp. 53-64. 

* Davenport, Eugene, The Principles of Breeding, Ginn, New York, 1907, 

® Thorndike, E. L., Mental and Social Measurements, New York, 1913 ; 

Brown, William, The Essentials of Mental Measurement, Cambridge 
(Eagland), 1911; Whipple, Guy M., Manual of Mental and Physical 
Tests, Baltimore, 1914. 
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omists.^ Pearson, in explaining what is meant by correlation, 
says: 

'Two organs in the same individual, or in a connected pair of indi- 
viduals, are said to be correlated, when a series of the first organ of a 
defimte size bemg selected, the mean of the sizes of the corresponding 
second organs is found to be a function of the size of the selected 
first organ. If the mean is independent of this size, the organs are 
said to be non-correlated. Correlation is defined mathematically by 
any constant, or series of constants, which determine the above func- 
tion.^^"* 

As Pearson explains, the word ^^organ” is understood to cover 
any measurable characteristic of an organism, and the word 
'^size^^ its quantitative value. 

The concepts have been illustrated by Professor Persons as 
follows: 

''Suppose that we are attempting to answer the question, do tall 
fathers have tall sons? In this case, stature is the 'measurable char- 
acteristic^ in each of 'a connected pair of individuals.^ Suppose the 
average stature of all adult males is sixty-S3X inches; suppose we 
select several thousand fathers whose stature is seventy-two inches 
or more, six inches above the average for all, and find the mean 
stature of the sons of this group of tall fathers to be sixty-nine inches, 
three inches above the average stature of all adult males. If similar 
results appear consistently for selected fathers and their sons, we 
may conclude that the stature of sons depends upon the stature of 
fathers; or, in other words, the stature of sons is a function of the 
statures of fathers; or, in still other words, the statures of fathers 
and sons are correlated. We may be able to state a 'law^ of the 
inheritance of stature, or give the stature of sons as a function of the 

* Hooker, R. H., op. cit; Yule, Introduction to Theory of SfatisUcs^ 
London, 1911 ; Bowley, A. L., Measurement of Groups and Series, Lon- 
don, 1903, Elder ton, W. Palin, Frequency Curves and Correlation, Lon- 
don, 1906 (?) ; Persons, W. M., “The Correlation of Economic Statis- 
tics,” Publications of the American Statistical Association, Vol. XII, 
December, 1910, pp. 287-322; Moore, H. L., Economic Cycles: Their Law 
and Cause, New York, 1914; Persons, Warren M., “The Construction of 
a Business Barometer Based upon Annual Data,” in American Economic 
Review, December, 1916, pp. 739-769, See also the notes and references 
to Chapter XIV. 

® Pearson, Karl, “Mathematical Contributions to the Theory of Evolu- 
tion, HI. Regression, Heredity, and Panmixia,” Philosophical Tram- 
actions of the Royal Society of London, 1896, A. 187, p, 257. 
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stature of fathers. It is clear, however, that although tall fathers 
may, in general, have tall sons, an individual tall father may have a 
short son, or perhaps several sons, some tall, some short. That is, 
two concepts are involved; first, the law or function or equation 
expressing the relation on the average, existmg between the two vari- 
ables involved, and second, the degree with which individual cases 
adhere to the law. 

“To illustrate the first concept, it may be possible to say that for 
an average deviation in stature of fathers of n inches from the mean 
for adult males, the stature of the sons of those fathers will deviate 

7h * 

in the same direction by inches. This is a law or function, the 

first concept that we have named.^ But the statement of the func- 
tion does not describe the situation completely. How accurately does 
the function describe the situation; how systematic is the relationship 
between statures of fathers and sons; are the exceptional cases few 
or many? These are different forms of a question which requires a 
quantitative answer. Such an answer is given by the coefficient of 
correlation. The coefficient is unity if there is no exception to the^ 
law of statures; it is zero if the statures of father and son are inde- 
pendent of each other; it is negative if tall fathers, in general, have 
short sons; it has a numerical value varying inversely with the degree 
of divergence (both m number of cases and magnitude) of the indi- 
vidual cases from a linear relationship.^^* 

In the language used by Professor Persons, certain expres- 
sions appear which were explained in earlier chapters. For 
instance, “all adult males’^ (a “population”) ; “average stature” 
(a mean or standard) ; “six inches above the average” (devia- 
tions from an average); “mean stature” (an average or 
norm) ; “on the average” (an expression indicative of con- 
sistency of occurrence — high probability) ; “how systematic is 
the relationship between statures of fathers and sons” (an ex- 
pression indicative of the nature of dispersion). That is, the 
illustration applies to (1) paired or connected populations or 
samples; (2) averages characteristic of both; (3) some meas- 
urements of the deviations from their respective averages; 

*Note omitted. 

® Persons, W. M, “Indices of Business Conditions,” Review of Meo- 
nomic Statistics, Cambridge, Mass., January, 1919, p. 131. 
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(4) systematic and regular distribution of the deviations from 
their averages; and (5) a measurement of the congruence of 
change in the corresponding deviations in both samples from 
their respective averages. Now, it is apparent that by the 
use of some of these statistical devices, frequency and other 
distributions are measured and compared. Only two new 
ideas are introduced — (1) connected or related series, and 
(2) the measurement of concurrent deviations. 

The Pearsonian coeflScient of correlation rests upon two 
assumptions. The first is that a large number of independent 
causes are operating in each of the series correlated so as to 
produce normal or probability distributions. Such causes are 
at work in determining the successive results secured by 
Darbishire in throwing his twelve dice. They undoubtedly are 
also operating to produce the heights of both fathers and sons 
)n Professor Persons^ illustration. Such series, as we have 
learned, can be summarized by the use of averages and by 
measures and coefficients of dispersion. 

The second assumption latent in the Pearsonian coefficient is 
that the forces so operating are not independent of each other 
— ^in the random sense — but that they are related in a causal 
way. This is evidently the case in the second throws of dice 
wherein some of the '^effects” are determined by the condi- 
tions in the first throws — chance, however, having fully oper- 
ated to produce the result. It is also true in the case of the 
heights of sons if they are correlated with those of their fathers. 

To count in the second dice throws part of the results 
secured in the first throws does not have the effect of produc- 
ing distributions any less normal in the second throws. 
Chance operates just the same. The only thing which is done 
in the illustration is to transfer to the chance distribution in 
the second throws some of the chance results in the first 
throws. Any throw is as much governed by chance as any 
other throw. Accordingly, such a transfer is legitimate. Simi- 
larly, the height^ of a large number of fathers tend to conform 
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to the normal probability curve. Such a condition may also 
be expected of those of their sons. If the forces producing 
these results are not independent of each other, then it is said 
that the heights of sons are correlated with those of their 
fathers. 

Upon the bases of these two assumptions, Pearson con- 
structed the formula for his coefficient. His own words in 
respect to the organs correlated are as follows: 

The assumptions are: first, “that the sizes of this complex of 
organs are determined by a great variety of independent contributory 
causes, for example, magnitudes of other organs not in the complex, 
variations in environment, climate, nourishment, physical training, 
and innumerable other causes, which cannot be individually observed 
or their effects measured''; second, “that the variations in intensity 
of the contributory causes are small as compared with their absolute 
intensity, and that these variations follow the normal law of distribu- 
tion." ^ 

{2) The Pearsonian Coefficient of Correlation Formula 

The Pearsonian coefficient of correlation formula ^ is 

r = ^ where 

n O’! o ‘2 

r = the coefficient of correlation 
xy = the product of a concurrent pair of deviations 
2 = the process of sui^ation ^ 

O'! zn the standard deviation, S.D., of one (X) series 
c 72 = the standard deviation, S,D , of the other (F) series 
n = total number of pairs of items 
This formula gives values ranging from 1 through 0 to + 1.^ 

When 'Exy is positive, correlation is positive; when it is 
negative, correlation is negative. Positive correlation may re- 

^ Pearson, op. dt, p. 262. 

* For the method by which this formula is derived, see Yule, G. Udny, 
Introduction to the Theory of Statistics, Griffin. London. 1911, pp. 168- 
174. 

® For proof of this, see Bowley, A L., Elements of Statistics, 4th Fd , 
King, London, 1920, p. 354. 
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suit from positive items (that is, items larger than the mean) 
in one (Z) series being associated with positive items (that 
is, items larger than the mean) in the other (Y) series, or from 
negative items (those smaller than the mean) in one (X) 
series being associated with negative items (those smaller 
than the mean) in the other (Y) series. Negative correlation 
results from positive values (those larger than the mean) in 
one (X) series being associated with negative values (items 
smaller than the mean) in the other (F) series, or vice versa. 

When positive and negative deviations in the two series 
are indifferently associated, correlation tends to be zero, reach- 
ing this limit when the negative products exactly counter- 
balance those which are positive. 

It should be noticed that the sum of the products of the 
deviations — 'Sxy — is a function both of the amount and sign 
of the deviations. Moreover, since the deviations are taken 
from the respective means of the series and these may differ 
not only in size but also in the unit of measurement, some 
divisor is necessary in order to reduce them to the same de- 
nomination. The standard deviation in each case is the ap- 
propriate factor here as it is in the measurement of relative 
dispersion. But since the deviations are multiplied together, 
the suitable divisor is the product of the standard deviations. 
Correlation coefficients, however, are compared for series with 
different numbers of pairs of items. ^Accordingly, n is inserted 
in the denominator, thus giving an average value independent 
of the number. Accordingly, the correlation coefficient — r — of 
two sets of values, each expressed in standard deviations as 
units, is the arithmetic average of the products of deviations 
of corresponding values from their respective means. 

^Hence r is a quantity which depends on all the observations, is 
zero when mdependence is complete and Mean xy=o, is independent 
of the units in which X and F are measured, increases whenever a 
positive Xt IS found with a positive yt or b, negative Xt with a negative 
yt, but only reaches the value + 1 (which it can never exceed) when 
X and y are connected rigidly by the equation y = x X constant. If 
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positive x*8 are found with negative and vice versa, r vanes from 
0 to — 1. 

'V is therefore a sensitive measurement of the amount of correla- 
tion”^ 

Now it is apparent from Table 66 that the two series, first 
throws ( 7) and second throws (X) , are not correlated. That 
is, neither high nor low values in (7) are associated with high 
or low values or vice versa in (X) . With essentially the same 
values in the second throws varying values are found in the 
first throws. Similarly, with essentially the same values in 
the first throws different values are found in the second throws. 
Relations are different in Table 69. In this case, as the values 
in the first throws increase so do those in the second throws. 
That is, the two series are positively correlated. If with in- 
creases in the first were found decreases in the second, or 
vice versa, then the two series would be negatively correlated. 
If the association were not greater than that secured from 
random selection — as in Table 66 — correlation would be small, 
the coefficient approaching zero. 

While such frequency tables as 68 and 69 indicate correla- 
tion, they do not measure it. The fact of correlation is gen- 
erally evident from the natme of the distribution of the 
frequencies in the lines and in the columns. If the area of 
concentration extends from the upper left to the lower right 
corners of the frequency surface, then correlation is positive; 
if from the upper right to the lower left, it is negative.^ If 
neither arrangement is apparent, as in Table 66, correlation is 
small and the type in doubt. 

Moreover, if correlation is present, the arithmetic means 
and the medians of the rows and of the columns form a more 

BBowley, A. Ll, Memewts of Statistics, 4th Ed., King, London, 1920, 
pp. 354-355. 

*The nature of correlation, that is, whether positive or negative, as 
indicated by the direction which the concentration takes, is obviously 
determined by the ways in which the scales on the respective axes are 
written. 
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or less regular progression.^ By this test the first and second 
throws of dice, as shown in Table 66, are not correlated. The 
raedians in the rows and columns are constant at about 5-6. 
On the other hand, in the series in Table 69 — ^which are known 
to be highly correlated — ^the progression of the medians of the 
rows and columns is strikingly regular. In terms of aver- 
ages, large values in one series are associated with large 
values in the other series. 

In general, if the average values for the detail in the rows 
and columns are linear — best described by straight lines— then 
the Pearsonian coefficient is a suitable measure for measur- 
ing correlation. The coefficient may be computed both from 
grouped and ungrouped data. While the methods are some- 
what different, the principles are identical. 


(S) The Calculation of the Pearsonian Coefficient 
of Correlation 

a. In Ungrouped Series 

In an address on Concentration of Power Supply, Mr. 
Samuel Insull, President of the Commonwealth Edison Com- 
pany, Chicago, said in relation to statistics there considered: 
“The income per kilowatt hour goes down pretty steadily, the 
output per capita goes up pretty steadily, the load factor im- 
proves as selling price is lowered, and the output per capita 
goes up as the selling price is lowered.’^ ^ These conclusions 
were based upon a consideration of the United States Census 
figures for 1912 on the generation of electrical energy giving 
the capacity load factor,^ output per capita, and income per 
kilowatt hour by states. It is the correlation of the load fac- 

* See Figure 78. 

® Address before tlie Finance Forum of the Young Men’s Christian 
Association, New York, 1914, privately printed, p. 26. 

* Ratio of average load to capacity in this case, p. 26. 
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TABLE 70 

Table Showing bt States the Capacity Load Factor and the 
Income per Kilowatt Hour in the Generation op 
Electrical Energy 


State 

A 

Sa 

3 o 

A o 

X 

Devia- 

tions 

PROM 

Aver- 

age 

Load 

Factor 

X 

Devia- 

tions 

Squared 

Income per 

N K W H 
(in cents) 

Devia- 

tions 

from 

Aver- 

age 

Income 

PER 

K W.H. 
y 

Devia- 

tions 

Squared 

2/* 

A 

^ 2 c 

« B 2 

5 5 

0^9 

Total .... 

av. 

21.4 


4144.61 

av. 

3.45 


177.2011 

— 444.735 
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TABLE 70 {Continued) 


I 

State 

o 

3 

o 5 

2 o 

X 

Dbvia- ! 

TIONB 

PROM 

Aver- 

age 

Load 

Factor 

X 

Devia- 

tions 

Squared 

Income per 
»rj KWH 
(in cents) 

Devia- 

tions 

PROM 

Aver- 

age 

Income 

PER 

KW.H. 

V 

Devia- 

tions 

Squared 

a/* 

& 

O 

S 5 
to ^ 

o 

N. Car 

18.7 

1 — 

2.7 

7.29 

190 

— 1.55 

24025 

+ 

4.185 

N. Dakota. . 

12 9 


8.5 

72.25 

7.01 

+ 3.56 

12.6736 


30.260 

Ohio 

18.6 

— 

2.8 

7.84 

2.99 

— .56 

,3136 

+ 

1568 

Oklahoma . . 

19.7 

' — 

1.7 

2.89 

454 

+ 1.09 

1.1881 


1836 

Oregon 

20.7 

— 

.7 

.49 

2.39 

— 1.06 

1.1236 

+ 

.742 

Penn 

15.7 

— 

5.7 

32.49 

414 

-j- -69 

4761 


3 933 

Rhode Island 

18.4 

! 

3.0 

9 00 

3 71 

.26 

.0676 

— 

.780 

S Carolina. . 

30.7 

!+ 

93 

86 49 

124 

— 2.21 

4 8841 

— 

20 553 

S Dakota . . 

14.0 


7.4 

54 76 

4 58 

1+1.13 

12769 

— 

8.362 

Term 

17 A 

' — 

40 

16.00 

3 24 

1— .21 

0441 

+ 

.840 

Texas 

27 6 

+ 

62 

38.44 

3.38 

— .07 

.0049 


434 

Utah 

26 0 

+ 

46 

21 16 

175 

— 1.70 

2 8900 

— 

7 820 

Vermont . . . 

21.9 

:+ 

.5 

.25 

207 

1—1.38 

19044 

— 

.690 

Virginia — 

8.1 


13 3 

176.89 

2.65 

.80 

.6400 

+ 

10 640 

Wash 

14 2 

— 

7.2 

51.84 

1 433 

+ .88 

.7744 

— 

6 336 

West Va 

16.1 

— 

5.3 

28.09 

260 

i— 85 

.7225 

+ 

4 505 

Wisconsin . . 

249 

+ 

3.5 

12.25 

2.92 

— .53 

.2809 


1.855 

Wyoming .. . 

161 


5.3 

28 09 

6 24 

1 + 279 

7 7841 

— 

14.787 


tor and the income per K.W.H. which is measured in Table 
70 and the accompanying computations,^ 

In this case the “capacity load factor” constitutes one (Z) 
series, and the “income per K.W.H.” the other (7) series. 
The steps in calculating the coefficient of correlation are as 
follows: 

1. Determine the arithmetic mean in each of the series. 

2. Calculate the deviations (differences) of each of the 
items in the series from their respective arithmetic means. 

^ These figures are inadequate for a satisfactory study of this character. 
They will, however, serve to illustrate the manner in which similar data 
may be compared. 
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(The deviations of the items in the X series are given in the 
column marked x; for those in the Y series, in the column 
marked y ) . 

3. Square the deviations for each of the series. See 
columns marked and (/). 

4. Multiply together the corresponding deviation§ for the 
X and the Y series (that is, the amounts in columns x and y ) . 

5. Algebraically sum or total the products obtained in 4 


The total secured from step five gives the numerator — 2 xy 
— of the coefficient. But the standard deviation in each series 
is also required. This is determined by using the formula,^ 



In the illustration in Table 70 the d^s in the X 


series are called x’s; those in the Y series, Accordingly, 

the formula for the X series is for the Y series, 

1 n 


\ 


n 


Each of the amounts required for the coefficients are now 
available except the n of the denominator, n means the num- 
ber of pairs of values — in this case 47, since there are 47 
states for which data are available. 


V 

V 


The standard deviation of the X series is 
= 9,39 ; that of the 


4144.61 


47 


series 


V 

ies, 


n 

n 


or 


or 


177.2011 

47 


= 1.95. Inserting these and the other appropriate 
S xy — 444.735 


values in the formula, r = 


gives r : 


njidg' ° 47 X 9.39 X L95 

= —0.517. That is, the two series are negatively correlated. 
The quantity — 0.517, is a measure of the congruence of 
^ See p. 349. 

* For a discussion of the “significance” of this coefficient, see pp. 428- 
430 
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change in the deviations of the items in the two series. It is 
the mean of the products of the deviations — ^measured from 
the averages of the series — expressed in units of standard de- 
viations. The negative sign (— ) indicates that on the aver- 
age, positive and negative deviations, or vice versa, are 
associated. The decimal (0.517) shows the degree of such as- 
sociation. If an increase in one series were associated with a 
proportional decrease in the other series, or vice versa, the 
ratio would be — 1. 


b. In Grouped Series 

The ungrouped series in Table 70 might be tabulated in 
double frequency form similar to the tables showing throws 
of dice If this were done, provision would be made in the 
stub (or the caption) for the load factor per cents, and in 
the caption (or the stub) for the K.W.H. amounts. The 
states would then be tallied in columns and rows according 
to the unit classes in stub and caption. 

In order to show the method of calculating Pearson^s r for 
grouped data, rental payments made by retail clothing stores 
are used. The question upon which information is desired is 
as follows: In what manner and to what degree, if any, in 
retail clothing stores are the amounts of rent paid in units 
of sales correlated with rental payments in units of floor space? 
A sample of 150 stores is used. 

The data for the different stores might be arranged in the 
form shown in Table 70. They would then appear as follows: 


Stoes 

Rent per 
$100 OP 
Sales 

Rent per 

100 SQ PT. 

OP Floor Space 

1 

$ .75 

$30.00 

2 

1.25 

32.00 

3 

.92 

34.00 

4 

.87 

36 00 

etc. 

etc. 

etc. 


In this case, however, the form of arrangement selected 
is the double frequency table 71. 
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TABLE 

Tabub Showing the Method of Calculating the Correlation 

the True 
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|D «»■ 


X Series 

Per 100 Square Feet of Floor Space 
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2 3 4 1 
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71 

CoBBncniNT Foct Grouped Sesibs, DByiAnoNs Bbinq Taxbn irom 
Arithmetic Mean 


Deviations 
from Arith. 
Mean 

a 

Deviations 

Squared 

d* 

Dbviations 

Squared 

Times 

Frequencies 

fd* 

Products of 

THE Respective 
Deviations in the 
Two Series 
(«y) 

+ 1.31 

1.72 

$2752 

+ 461.64 

+1.11 

1.23 

6.15 

+ 7160 

+ .91 

.83 

3.32 

+ 73.35 

+ .71 

.50 

1,50 

+ 2961 

+ .51 

.26 

260 

+ 4539 

+ .31 

.10 

1.10 

+ 39.65 

+ .11 

,01 

.15 

— .83 

— .09 

.01 

.17 

+ 573 

— .29 

.08 

1.20 

+ 2944 

— .49 

.24 

5,04 

+ 57 87 

— .69 

.48 

8.16 

+ 144.00 

— 89 

.79 

869 

+ 153 17 

— 1.09 

1.19 

595 

+ 93.20 


Total $7155 

Arith. Mean = $1.79: F senes r = 


120382 


+120352 


150 X .69 X 18.4 


SX». or o, = 

« S.69 


$71.55 

150 


+ 1203 82 , 
1904.40 


+ .63 


P.E.= ±.033 

f= +.63±.033 


The arrangement of the data across the surface of Table 71 
indicates plainly the fact of positive correlation.^ But what 
is the degree of correlation? Pearson’s r gives this in precise 
form. 

The steps to be taken in securing r are as follows: 

1. Total the frequencies in the rows — ^that is, the num- 
bers of stores paying different amounts of rent per 100 square 

^Notice the method of writing the stub classes. Positive correlation 
hi this case is indicated by a different alignment from that in Table 68, 
for instance. 
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feet of floor space. The totals are 16, 5, 4, 3, 10, 11, 15, 17, 
15, 21, 17, 11, 5. Total, 150. 

2. Total the frequencies in the columns — ^that is, the num- 
bers of stores paying different amounts of rent per $100 of 
sales. The totals are 1, 2, 5, 10, 19, 17, 26, 11, 16, 9, 5, 7, 5, 
1, 2, 14. Total, 150. 

3. For each of these frequency distributions calculate the 
mean — ^use the center of the group for precise items — and the 
standard deviation. The methods by which these computa- 
tions are made have already been explained. (In order to get 
S.D. in each case, each deviation must be multiplied by the 
number of corresponding frequencies.) 

4. Calculate the products of the corresponding deviations 
from the means in the two series. The items in the table 
deviate from the averages of both series. For instance, the 
10 instances in which rent as a per cent of sales is 2.30 deviate 
from the average for the entire series, 1.79, by -|~.51. At the 
same time, they also deviate from the average of the series 
showing the amount of rent paid per 100 square feet of floor 
space. The average in this case is $38.6. One of them devi- 
ates ~ 11.1; 2 of them ~ 6.1; 2, + 3.9; 1, + 13.9; 1, -|- 18.9; 
and 3, +23.9. That is, to get the products of the deviations 
of the ten items from the averages in the series it is necessary 
to make the following computations: 


IX-ILI] 
2X- 6.1 
2X+ 3.9 
1 X + 13.9 
1 X + 18.9 
3 X + 23.9J 


X + .51 = 45.39 


The other amounts in the column, {xy), are secured in 
similar manner. ; 

5. Algebraically sum or total the products secured in 4 
above See the total of column {xy). 

If the va%p secured the above processes are inserted 
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in the formula, r = 


n Cl ffa’ 


the result is as follows: 


r 


+ 1203.82 _ 

150 X .69 X 18.4 ~ ■T’ 


That is, correlation is positive — ^the + sign indicating this fact. 

But it is sometimes advantageous to compute r for grouped 
series by assuming the arithmetic means, and later by cor- 
recting in each of the steps the errors due to the assumption. 
The manner in which this is done is illustrated in Table 72 
by using the data given in Table 71.^ 

The notation used is as follows: 


/ = frequencies in the X and in the Y series 
X = deviations in steps (groups) in the X series 
y = deviations in steps (groups) in the Y series 
2 = process of summation 
dx = average error of deviations in the X series 
= average error of deviations in the Y series 


The steps in computing r by this method are as follows: 

1. Total the frequencies of the lines and of the columns. 
(See column / and line /) . 

2. Choose an average (group) in the X and in the Y series, 
respectively. Draw lines at right angles across the table en- 
closing the frequencies in these groups. 

3. Indicate the group deviations above and below the as- 
sumed average (group). See column y for the deviations for 
the Y series; and line x for the deviations for the X series. 

4. Multiply the frequencies in the two series by their 
respective group deviations. See column fy for the Y series, 
and line fx for the X series. 

5. Square the group deviations in the two series and mul- 
tiply by their respective frequencies. See column fy^ for the 
F series, and for the X series. 

6. Compute the amount and nature (plus (+) or minus 

SThe method of computing the arithmetic mean fi*om an assumed 
average is shown in Table 37. Similarly^ the method of computing the 
standard deviation when an assumed mean is used is illustrated in 
Table 63. 
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36 50 80 90 76 

17 
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64 81 80 175 

180 49 

128 1134 

2251 





(•— )) of the deviations. This is done by multiplying the re- 
spective frequencies by the amount of group deviations in the 

2 X (gross) \ 

— and + / ’ 

— ) and plus (+) entries found in 6, 
See column 

Multiply the net deviations found in 7 by the group 
deviations (see column y) in the Y series (see column ^xy) 
In doing this it is necessary carefully to observe the signs of 
the products. 


X series. See columns y 

7. From the minus ( 
compute the net deviations. 

8 . 
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r2 

CJoefmcient fob Gbotjped Sebibs, the Deviations 
/Assumed Abithmetic Mean 
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pos. = 304 

Xfy pos *« 177 


S/x neg. = 121 

Xfy meg == 263 

Sxy pos. V 1113 

S/x * 183 

S/1/ = -86 


S/x» = 2251 

S/y» = 1836 


d =5^=112=12 

* N 150 
d* = 1.4 

^ ^ 2/2/ _ -86 _ _ 

.4 

Xzy n^, a* — 14 






S*y « 1099 

2251 . ^ 1836 

“ 150 150 

= 15 0 - 1.4 « 13 6 « 12 2 - 4 * 11.8 

= V13 6 « 37 * V118 =* 3.4 


N «150 


Sxy 


- d, X 4 

Cy X CTy 


1099 

150 


~ (1 2 X - .«) 


P.E. 


37 X 34 

7 3 + 72 _ 8.02 
'37 X34 12.58 

» -f 64 

: .6745 4^’= *.08S 


V n 
-i- 64 db 033 


The four quadrants of the correlation table relative to the 
means are as follows: 


X « — 

y j 

X «= + 

y “ + 

y *' — 

X- + 

y 
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Accordingly, the signs in the column 2 xy are determined by 
these relations. 

The foregoing computations give the deviations from the 
assumed means and the data based upon them for computing 
the standard deviations in the two series. But since the posi- 
tive and negative deviations in the two series do not balance 
(see the totals in column jy^ and in line fx) the assumed are 
not the correct averages. The deviations in the X series are 
too large, and those in the Y series too small. Accordingly, 
corrections must be made for them. This is done in the 
blocks at the right of the table. The average error in the de- 
viations in the X series (d^) and the corresponding error in 
the Y series (dy) must be squared, and subtracted from the 
average of the respective squared deviations in order to ob- 
tain the true standard deviations. The product of these aver- 
age errors must then be subtracted from the average of the 


products, 



in order to get the true sum of the 


products. 

These various adjustments are carried out in the compu- 
tations at the right of the table. While the deviations are 
taken in groups so also are the jS. D.^s and the xy products. 
Accordingly, this fact may be ignored in the final result. 

The coefficient of correlation between rental payments in 
units of sales and rental payments per 100 sq. ft. of floor 
space for the 150 retail clothing stores is as follows: 


_ +1099 
150 

3.7 X 
_ + 8.02 
12.58 
= + . 64 " 


(1.2 X -.6) 


3.4 


* This amount differs slightly from that secured in Table 71 because of 
adjustments of decimal amounts. 



THEORY OF CORRELATION 


425 


(4) Regression Lines and Coejjicients of Regression 

But correlation between the series in Tables 71 and 72 
is not perfect. The means — best values — of the columns and 
rows are not identical as they would be if perfect correlation 
existed.^ In Figure 78 the means of the rows are indicated by 
crosses (x x) for different values in the Y series. Similarly, 
the means of the columns are indicated by circles (o o) for 
different values in the X series. If perfect positive correla- 
tion obtained, the means would fall on a single straight line. 
As it is, two lines are necessary to show the relations, both 
the crosses (x x) and the circles (o o) being essentially linear 
in their arrangement. The best indication of the directions 
which they take are straight lines so drawn that the sums of 
the squares of the differences, measured parallel to the Y 
axis, of the several points from the lines are a minimum. 
These are the “best fitting lines” under the least-square as- 
sumption.^ 

If the respective deviations in each series, X and Y, from 
their means were expressed in units of standard deviations — 
that is, if each of them were divided by the standard devia- 
tion of the series to which it belongs — and plotted to a scale 
of standard deviations, plus (-[-) and minus (— ), the slope 
of a straight line, best describing the plotted points, would 
be the correlation coefficient, r. 

The best fitting line of the means of the rows is AB, and 
of the means of the columns, CD. These are the so-called 
“regression lines,”® their slopes being expressed in terms of 

^ If, in the case of the dice throws, the second throws were taken to be 
equivalent to the first throws, then the means of the columns would be 
the same as the means of the rows. See Table 69 in which ten of the 
dice in the first throws are counted in the second throws. 

®The sum of the squares of the deviations is a minimum when taken 
from the arithmetic mean. See p. 350, and reference to Yule. 

^ A term introduced by Sir Francis Galton in his studies of inheritance. 
As Yule suggests such lines might more fittingly be called “characteristic 
lines.” Yule, G. TJ., op cit, p. 177. 
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(1) the correlation coefBcient, r, (2) the standard deviation — 
ax — of the X series, and (3) the standard deviation — ay— of 
the Y series. 

Series X Series Y 

Rent per 100 sq. ft. of Floor Space Rent per $100 of Sales 
Average = $38.60 Average = $1.79 

Standard Deviation = $18.40 Standard Deviation == $ .69 

r = + .63 

The regression coefficient of X — ^rent per 100 square feet 
of floor space — on Y — rent per $100 of sales = r Sub- 

ay 

stituting the values above, we get .63-^^= 16.79. That is, 
X = 16.79 y, 

FIGURE 78 

Regression Lines of Rent per Unit of Floor Space on Rent per 
Unit op Sales, and Rent per Unit of Sales on Rent per 
Unit of Floor Space for 150 Retail Clothing Stores 


0 $5 $10 $15 $20 $ 25$30 $35 $40 $45 $50 $55 $60 $65 $70 $75 
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What does such a coefficient mean? If stores were selected 
with rent per $100 of sales, 1 per cent above the average, the 
regression coefficient, 16.79 of rent per 100 square feet relative 
to rent per $100 of sales, indicates that we should expect the 
stores selected to pay about $16.79 above the average amount 
per 100 square feet of floor space. In general, if stores pay- 
ing X dollars in rent per $100 of sales above or below the mean 
were selected, we should expect the amounts which they pay 
in rent per 100 square feet of floor space to be 16.79 x from 
the average amount so paid. 

The regression coefficient of Y — ^rent per $100 of sales — on 
X — ^rent per 100 square feet = r — ~ .63 =.023. That 

<Jx lo.4 

is, y = .023 X. 

If, for instance, stores were selected which paid in rent per 
square foot of floor space $10 more than the average, the re- 
gression coefficient, .023, indicates that they would most prob- 
ably pay in rent per $100 of sales .23 per cent above the 
average. 

Lines AB — regression of X on Y — and CD — regression of 
F on X — are drawn in keeping with the respective coefficients 
of regression, x = 16.79 y; and y = .023 x. The manner in 
which this is done is by locating two or more points of X on 
F and F on X by the use of the following formulae: 

For F on X 

y ^y z=zr -^^{x --x) j where y and x=any values of 

the correlated values, and y and x are the means of the 
respective series. 

Inserting values in this equation we get 

1.79 = .023 (x-38.6) 

For X on F the corresponding formula is 

Inserting values in this equation we get 

X - 38.6 = 16.79 (t/-1.79) 
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Then solving for values of y with different values of and for 
values of X for different values of y we get 


Begrussions 
OH X 

OF Y Series 

Series 

Degressions of X Series 
ON Y Series 

' 

y 

y 

z 

30 

1,59 

1.00 

25.34 

40 

1.82 

2.00 

42.13 

50 

2.05 

3.00 

58.92 

etc. 

etc. 

etc. 

etc 


When X increases by 10, y in- When y increases by 1.00, 
creases by 10 X -023 or .23. x increases by 16 79. 

By using these relations, the AB line — regression of X on F 
— and the CD line — Y on X — are drawn. 

The regression coefficient is therefore a fixed ratio between 
the deviations of attributes in correlated series whereby it is 
possible, if the amount is known by which the attribute in one 
series deviates from the mean, to predict the extent to which 
the associated attribute will most probably deviate from its 
mean. The extent of deviation in each series is indicated in 
its own unit of measurement. Prediction, of course, rests upon 
the law of probability and theory of error already discussed.^ 

(5) The Probable Error of the Coefficient of Correlation 

Is the amount of negative correlation between the load 
factor and income per K.W.H., and the amount of positive 
correlation between rent and sales and rent and floor space 
'^significant”? A similar question was asked ^ about individ- 
ual measurements and the means of a series of measurements. 
The answer was found in the probable error concept. It was 
said that the prcffiable error is a measure which if added to 
and subtracted from a most probable 'measurement — ^mean 
in the^i^se^of an individual measurement; average of a series 

^For an excellent discussion of regression lines and coeflBlcients see 
Rugg, H. O., Statistical Methods AppUed to Education, Houghton Mifflin, 
1917, pp. 252-259. 

*See p. 370 f. 
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of means for a mean — gives amounts within which the chances 
are even that an item of the same type, if selected at random, 
will fall. 

The correlation coefficient too has a probable error. It is 
that amount on either side of the average coefficient of cor- 
relation within which half of the values of a large number of 
coefficients fall if computed from series of pairs of items 
chosen at random from a universe having in general the given 
correlation coefficient. That is, if from a large population 
successive pairs of samples were drawn at random and their 
correlation coefficients determined, the results would differ. 
They, however, would tend to describe the normal probability 
curve, being systematically distributed about a mean. The 
probable error of r, therefore, is an amount which if added to 
and subtracted from the average correlation coefficient pro- 
duces amounts within which the chances are even that a 
coefficient of correlation from a series selected at random will 
fall. 

The formula for the probable error of Pearson’s coefficient 

1 — 

of correlation — r — is .6745 — where n is the number of 

V ^ 

items paired, and t the coefficient itself. The amount secured 
from this formula is a function of the size of the coefficient — 
r — and the number of items. 

ft has become conventional to say that for r to be significant 
it must be at least six times its probable error. Under such 
circumstances the odds are large that another coefficient com- 
puted from series selected at random would fall within a range 
above and below the mean set by such an amount. Judged by 
this standard, both correlation coefficients are significant. The 
coefficient, — 0.517, between the load factor and K.W.H, is 
more than seven times its probable error, .0721. The coeffi- 
cient for rental payments in terms of sales and in units of 
floor space, + .63,^ is approximately twenty times its probable 

*By another computation it is + S4 : — see Table 72. 
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error, .033. The coeflBcients with their probable errors written 
in the customary manner are as follows: 

Load factor and K.W.H.: r = — 0.517 db .0721 

Rents in units of sales and in units of floor space: r = 
+ .63 ± .033. 

2. THE CONCUERENT DEVIATION METHOD 

If a measure of association in the direction of change alone 
is desired, the method of concurrent deviations may be used. 

Table 73 is composed of four primary sections. In the 
upper left-hand comer the stores which had expenses above 
the average in the first year ^ and also in the second year are 
tabulated in classified groups according to per cents by which 
their expenses exceed the averages in the respective years. 
The upper right-hand comer contains the stores having ex- 
penses greater than the average in the first and less than the 
average in the second year — ^the deviations being shown in the 
same manner as in the quarter just described. Similarly, 
stores having expenses less than the average in the first and 
greater than the average in the second year are listed in the 
lower left-hand corner. The lower right-hand corner contains 
stores the expenses of which in both years were less than the 
average. Such an arrangement constitutes a four-part ^Mouble 
frequency” table. 

An inspection of the table indicates that stores which had 
expenses higher or lower than the average in the first year 
generally had expenses higher or lower than the average in 
the second year. A few stores, however, the expenses of which 
were higher or lower than the average in the first year had 
expenses lower or higher in the second year. In no one of the 
four sections is there complete identity as to the amount of 
the difference of the expenses from the average for the stores 
in the first and in the second year. 

* By aBd “second^* years are meant the first and second of a pair 

of years, as 1916 and 1917, 1917 and 1918, etc. Table 73 is the summa- 
tion of such distributions for the four pairs of years, 1916 to 1920, 
inclusive. 
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TABLE 73 

Number op Identical Retail Clothing Stores Distributed Ac- 
cording TO THE Amount and Type op Their Expense Devia- 
tions PROM THE Average in Two Successive Years 


Year 

First 





The degree of correlation between the positions of the stores 
relative to the averages in the first and second years of each 
pair may be measured by the formula: ^ 

^ If the quantity, 2c-n, is negative, a minus sign is used before it and 
before the radical so that the square root can be taken. 
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where r is the coefficient of correlation ; 

c, the number of pairs having like signs; and 
n, the number of pairs of items. 

The association of positions relative to the averages may be 
siunmarized as follows: 


Second of the Pairs of Years 



Pearson’s r, which measures not only the direction but also 
the amoimt of deviation relative to the average, gives a value 
of + .74 zb .012. 

3. GRAPHIC METHODS OF SHOWING ASSOCIATION BETWEEN 
DIFFERENT VARIABLES 

Figure 79 shows an inverse relation between the amount of 
annual sales in retail clothing stores and the size of inven- 
tories per unit of sales. 

Figure 80 shows a direct relation between the amount of 
annual sales in retail clothing stores and the annual rates of 
stock turnover. 
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FIGURE 79 

Amounts oi’ Inventory per $100 op Total Net Sales for Stores 
Classified by Size, 1919, 1918, and 1914, Combined 


1 

1 

Net Sales 
(in000‘s) 

Number 

Inver 

of 

Stores 

Amounts 

1 

Total 

(Average) 

920 

$38.00 

Under $20 

50 

70.67 

^20 to $40 

239 

53.29 

$40 to $60 

209 

46.73 

$60 to $80 

126 

44.53 

$80 to $100 

80 

41 40 

$100 to $140 

95 

39.43 

$140 to $180 

43 

36.67 

$180 to $220 

21 

29,00 

$220 to $300 

23 

29.49 

$300 to $500 

22 

26.57 

$500 & over 

12 

25 75 

Under $40 

289 

54.97 

$40 to $80 

335 

45.74 

$80 to $180 

218 

39 24 

$180 & over 

78 

27 24 


Inventories Per $100 of Sales 


Per Cent of Average 
0 30 60 90 120 150 180 210 


Average ^^38.00 


Figure 81 shows an essentially constant relation between the 
amount of annual sales in retail clothing stores and the 
amount paid in wages and salaries as a per cent of total op- 
erating expense. 


V. Conclusion 

The discussion of correlation in this chapter has had to do 
with its meaning and application under the assumption of the 
normal law of error distribution. It was in keeping with 
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such assumption that the Pearsonian coefficient was conceived, 
and it is only in this connection that the formula accurately 
measures correlation. 


FIGURE 80 

Annual Rates of Stock Turno\ter for Stores Classified by 

Size, 1919 


Net Sales 
CinOOO’s) 


Total 

CAverflfi:e) 

Uader $20 
$20 to $40 
$40 to $60 
$60 to $80 
$80 to $100 
$100 to $140 
$140 to $180 
$180 to $220 
$220 to $300 
$300 to $500 
$500 & over 

Under $40 
$40 to $80 
$80 to $180 
$180 & over 


Number 

of 

Stores 


314 

3 

43 

77 

45 

34 

45 
22 
12 
12 
14 

7 _ 

46 
122 

101 

45 


Annual Stock Turnover Rates 


Rates 


Per Cent of Average 


20 40 60 80 100 120 140 
I > 1| 


2.1 


1.2 

1.5 

1.7 

1.9 

1.9 


2.0 

1.9 
2.3 
2.6 
2.7 

2.9 


1.4 

1.8 

1.9 

2,7 



Average 2.1 


Bowley^s summary of his discussion of correlation may be 
used to close our own. 

may now sum up the treatment of correlation so far. If 
(x, y) is a pair of measurements (from their averages) of two vari- 
ables (related in space, in time, in a thing or in- an organism), and if 
when X is given as positive (or negative) there is a presumption that 
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y is positive (or n^ative), or a presumption that y is negative (or 
positive), then the variables are said to be correlated. In such a 

case Sxy does not tend to zero when n is increased, but to a limit 

Th 

written as r era? r = o, = 1, = — 1 have definite meanings; r is 
sensitive to all kinds of relationship between x and y. In general it 
may be expected to be the greater as ora (the mean scattering within 
the arrays) is less. If x and y are each the sum of p + g independent 
elements of which p (only) are common to x and y, then r equals 
p/(p + g), if the standard deviations of the elements are equal. If 
X and y are generated linearly from a multiplicity of independent 

FIGURE 81 

Amounts of Wages and Salaries per $100 of Total Expense for 
Stores Classified by Size, 1919, 1918, and 1914, Combined 


Wages and Salaries 


Net Sales 
(inOOO’s) 


Niamber 

of 

Stores 


Fer $100 of Total Expense 



Per Cent of .Average 
20 40 60 80 XOO 120 


Total 


C Average) 

929 

$55.23 

Under $20 

48 

56.30 

$20 to $40 

244 

55.87 

$40 to $60 

214 

54.54 

$60 to $80 

130 

55.85 

$80 to $100 

82 

55.22 

$100 to $140 

90 

54.96 

$140 to $180 

44 

58.26 

$180 to $220 

23 

57.22 

$220 to $300 

23 

53.75 

$300 to $500 

21 

53.20 

$500 & over 

10 

34.87 

Under $40 

292 

55.92 

$40 to $80 

344 

65.17 

$80 to $180 

216 

55.97 

$180 & over 

77 

54.50 



Average $55.23 
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causes (some of them common to x and y)^ then r defines the whole 
frequency distribution of the pairs, the regression loci are rectilmear, 

and their equations are 2 / = r — x, and x = r—.y. If the normal 

CTa? 

frequency surface cannot be assumed, but regression is rectilinear, 
the same equation is a good empirical statement of regression. If 
nothmg can be postulated as to the distribution of x and y or the 
averages of the arrays, the meanmg of the numerical value of r is 
undefined. ... In general, however, r may be said to measure the 
amount that is common in the systems of causation of x and y” ^ 
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CHAPTER XIV 


THE TREATMENT AND CORRELATION OF 
TIME SERIES 

1. Introduction 

The graphic representation of time or historical series was 
discussed in Chapter VIII. In that connection, attention was 
given primarily to (1) the methods of drawing simple and 
cumulative graphs, (2) scale conversion, (3) difference vs. 
ratio charts, (4) simple methods of smoothing time series, 
etc. Further attention to time series was reserved for this 
chapter because of the intimate relation of the subject to 
correlation, and to the discussion which must of necessity pre- 
cede it. Having now described the different methods of sum- 
marizing and comparing statistical series in terms of averages 
and of measures of dispersion and of skewness; and having 
stated the concepts of probability, and the theory of err(|r 
and correlation, we are now ready to discuss the statistical 
treatment and correlation of time series. 

II. The Nature of Changes in Time Series 

The most satisfactory way of showing the changes of a 
series of data over a period of time is to use a graph or line 
chart. The time intervals — days, months, years, etc. — are 
plotted along the abscissa axis, the spacings being proportional 
to the length of time covered. At the different time units, 
ordinates are erected according to a scale showing absolute 
amounts or ratio changes. A line connecting the successive 
ordinates gives a graphic picture of the ups and downs, in- 
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creases, decreases, and general trend which characterize such 
series. If nothing more than a general picture of the short- 
and long-time movements is desired, smoothed lines drawn 
free-hand or by a process of averaging will suffice. Indeed, 
any number of series may be roughly compared in this manner. 
It IS only when comparison requires that different types of 
changes be isolated that more refined methods are needed. 

Figure 82 shows the note circulation of chartered Canadian 
banks and wheat receipts at Fort William and Port Arthur, 
Canada, from 1909 to 1913, (1) as actual amounts and, (2) as 
average amounts secured by using a moving average of thir- 
teen months, centered at the seventh month. The lines plotted 
to the respective averages roughly indicate the trends, while 
those showing the actual amounts reveal the seasonal changes. 
Neither of the graphs, however, satisfactorily measures the 
trend or the seasonal movements. More refined methods are 
necessary.^ 

FIGURE 82 

Curves Showing Long-Time or Secular Changes 


(Note Circulation of Canadian Chartered Banks, and Wheat Receipts at Port William 
and Port Arthur, Canada, by Months, 1909 — 1913.) 



*See the discussion under Section III, infra. 
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The changes in time series may in general terms be spoken 
of as (1) long-time or secular, and (2) short-time. By a 
secular change is meant one which characterizes the direction 
over a number of years. There may be a general tendency 
for amounts to increase, to decrease, or to assume both direc- 
tions. The short-time changes are of a periodic or of an 
irregular type and of relatively short duration. 

The long-time change may sometimes be generalized into 
a trendy and be represented by a straight line drawn through 
the data rather than following the movement characterizing 
the “short” periods. Such a trend line, if positively inclined 
shows a tendency for the series to increase; if it is negatively 
inclined, a tendency for it to decrease. The forces back of 
such long-time trends in series relating to business, industry, 
social development, etc., are increases in population, improve- 
ments in sanitation and health, industrial growth, exhaustion 
of natural resources, improvements in standards of living, per- 
fection of the arts, and numerous other influences which op- 
erate steadily and persistently from year to year. 

The short-time changes may be classified into three groups: 
(1) those which are of a seasonal nature, (2) those which are 
cyclical, and (3) those which may be termed accidental or 
extraordinary. 

The seasonal changes are those which are traceable to forces 
inherent in the seasons themselves. They may be due to 
meteorological factors such as rainfall and temperature; to 
demands incident to crop planting, moving and marketing; to 
fad and fashion in dress; to shifts in population from unfa- 
vorable to favorable climates; to conventional practices of 
debt liquidation, payment of interest on bonds, taking of 
vacations — in fact to any circumstances peculiar to the sea- 
sons as such. Accordingly, in some series they are marked; 
in others negligible. 

By cyclical changes are meant those swings in business 
through periods of expansion, liquidation, depression, and re- 
covery, which have come to be known as “the business’ cycle.” 
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By accidental changes or movements are meant those which 
cannot be traced, (1) to the steady influences of growth or de« 
dine, (2) to seasonal adjustments and variations, or (3) to 
the rhythmical influences of the business cycle. They are 
rather due to fortuitous events such as wars, strikes, floods, 
earthquakes, etc. 

III. Methods of Measuring and Isolating Time Changes 

Having classified and described the different kinds of 
changes in time series, the more important methods by which 
they can be isolated and measured will now be considered.^ 
For the purpose of illustrating different methods, the time 
series showing the monthly production of pig iron from 1903 
to 1916 will be used. The amounts are contained in Table 74.“ 


TABLE 74 

Monthly Prodtjotion of Pig Iron in the United States 
(OOO’s of long tons) 


Yrs. 

Jan. 

Feb. 

Mar 

Apr. 

May 

June 

July 

Aug 

Sept. 

Oct. 

Nov 

Dec 

Ave 

1903 

1472 

1390 

1590 

1608 

1713 

1673 

1546 

1571 

1553 

1425 

1039 

846 

1452 

1904 

921 

1205 

1447 

1555 


1292 

1106 

1167 

1352 

1450 

1486 

1616 

1344 

1905 

1781 

1597 

1936 

1922 

1963 

1793 

1741 

1843 

1899 

2053 

2014 


1882 

1906 

2068 

1904 

2155 

2073 

mmiizM 

1976 

2013 

1926 

1960 

2196 

2187 

2235 

2066 

1907 

2205 

2045 

2226 

2216 

2295 

2234 

2255 

KMifil 

2183 

2336 

1828 

1234 

2109 

1908 

1045 

1077 

1228 

1149 

1165 

1092 

1218 

1348 

1418 

1563 

1577 

1740 

1302 

1909 

1801 

1703 

1832 

1738 

1880 

1929 

2101 

2246 

2385 

2600 

2547 

2635 

2116 

1910 

2608 

2397 

2617 

2483 

2390 

2265 

2148 

2106 

2056 

2093 

1909 

1777 

2237 

1911 

1759 

1794 

2188 

2065 

1893 

1787 

1793 

1926 

1977 

: 2102 

1999 

2043 

1944 

1912 

2057 

2100 

2405 

2375 

2512 

2440 

2410 

2512 

2463 

2689 

2630 

2782 

2448 

1913 

2795 

2586 

2763 

2752 

2822 

2628 

2560 

2543 

2505 

2546 

2233 

1983 

2560 

1914 

1885 

1888 

2348 

2270 

2093 

1918 

1958 

1995 

1883 

1778 

1518 

1516 

1921 

1915 

1601 

1675 

2064 

2116 

2263 

2381 

1 2563 

2780 

2853 

3125 

3037 

3203 

2472 

1916 

3185 

: 3087 

3338 

3228 

3351 

3212 

1 3226 

3204 

3202 

3509 

3312 

[ 3171 

3252 


3 Much of the following discussion is based upon the work of Professor 
W. M. Persons, Editor, The Review of Economic Statistics, Harvard 
Economic Service, Cambridge, Mass., to whom all students of the business 
cycle and of statistical methods are deeply indebted. His unique contri- 
butions not only to the methods of isolating the different changes in time 
series but also to the use of the correlation coefficient in the development 
of a business barometer and forecaster are outstanding events in the 
development of statistical methods during the past ten years. 

® Details taken from Review of Economic Statistics, Harvard Com- 
mittee on Economic Research, January, 1919, p. 66. 
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Graphic representations of the actual amounts of pig iron 
produced and of the long-time trend ^ are given in Figure 83. 

FIGURE 83 

Chart Showing the Actual Production of Pig Iron in the 
United States 1903 to 1916, and a Line Showing 
THE Long-Time Trend * 



1903 1904 1906 1906 1907 1908 1909 1910 1911 1912 1918 1914 1916 1916 


* Reproduced by courtesy of tbe Editors of the Review of E commie 
Statistics t Harvard Committee on Economic Research, Cambridge, Mass. 

An inspection of the curve of actual data in Figure 83 shows 
(1) a long-time tendency for production to increase; (2) more 
or less periodic rises and falls several years apart; and (3) 
ups and downs from month to month in each year. The curve 
seems to contain a definite trend, as well as cyclical and sea- 
sonal movements. But what is a high point for one period is 
a low position for another period, and vice versa. Moreover, 
the large swings through which the curve passes are blurred 
by the seasonal changes. It is only by isolating the different 
movements that a true picture of what happened in production 
during these years can be secured. Methods of doing this will 
now be explained. 

1. METHODS OF MEASURING LONG-TIME OR SECULAR TREND 

To determine a trend in historical data presupposes a pe- 
nod for which the trend is to be found. Moreover, the limit- 
ing term, ^fiong-time,” suggests that the trend is thought of 

^ For tbe way in which this line is secured see the discussion, pp. 444- 
447. 
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as being characteristic— typical or normal— of a period long 
enough for the influences determining it to work themselves 
out. Accordingly, the choice of a period requires (1) that 
as many years as possible should be considered,^ (2) that pe- 
riods of evident change in trend be excluded, ^ and (3) that 
periods of violent change from wars, major strikes, etc. — ^the 
^'accidental” phases of business growth and decline — be omitted. 

The period for which the trend is sought, therefore, cannot 
be studied too carefully. The addition or the elimination of 
a year or a number of years may materially change the trend 
if these conditions are not observed.® 

From an inspection of Figure 83, it appears that for the 
production of pig iron in the United States the period 1903- 
1916 may be used in order to secure a measure of long-time 
trend. 


(1) The Free-Hand Method 

A line drawn free-hand through the amounts showing 
monthly production might serve to give a general notion of 
the direction of change. Where it is to be drawn, however, is 
a matter of judgment. Different people would draw it at 
different positions and with varying slopes. If the trend 
when drawn is to be used as a base from which both sea- 
sonal and cyclical variations are to be determined, then its posi- 
tion should not be made a matter of opinion, but so far as 

^If the trend is to be used in connection with a study of the business 
cycle, the period should begin and end in the same phase — ^prosperity, 
liquidation, depression, recovery — of the cycle. 

“That is, if a straight line is to be fitted to the data. In some cases 
some form of a curved line is necessary. Persons’ judgment, after having 
examined a great number of statistical series relating to business and 
economic phenomena, however, is of interest. “It may be said that for 
over 95 per cent of economic series it is not worth while to search for 
a more complicated functional expression between the variables than one 
of the first degree.” (a straight line). Persons, W. M., Review of 
Economic Statistics, April, 1919, p. 135. 

* See the discussion of this phase of the problem by Persons, W. M., 
Review of Economic Statistics, Harvard Committee on Economic Re- 
search, January, 1919, pp. 8-18. ^ 
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can be of mathematical certainty. Accordingly, other than 
free-hand methods are necessary, although a line drawn in 
this fashion may be taken as a first approximation to a line 
upon which all could agree, and one which would rest upon 
an acceptable mathematical formula. 

(^) The Method of Averaging 

A trend line may also be determined by using some form 
of averaging. But different averages give different results as 
do also the same averages of different periods. As Persons 
says, after an exhaustive analysis of the use of moving aver- 
ages, “It is clear . . . that the use of moving averages does 
not eliminate the secular trend of the original series. The re- 
sulting averages present the problem with which we started, 
the measurement and elimination of the trend for the period 
in question.^^ ^ 

There is, however, something to be said for the use of mov- 
ing medians, more particularly when it is certain that the 
trend does not follow some mathematical law. The medians 
serve as a first approximation to the line sought, correction 
from which can be made by some appropriate smoothing 
device.^ 


(8J The Least-Square Method 

The line of “best fit” of a series of points was found in 
Chapter XIII to be the line from which the sum of the 
squares' of the items, measured parallel to the Y axis, is a 
minimum.^ Such a line passes through the arithmetic means 

lOp, cit, p. 12. 

* For a defense of the use of the moving median, see King, W. I., “Prin- 
ciples Underlying the Isolation of Cycles and Trends” in Journal of the 
American Statistical Association, December, 1924, pp. 468-475. 

*The Pearsonian coefficient of correlation is based upon this principle. 
That is, the slope of the line of regression of X on Y and Y on X — when 
the deviations of the items in each of the series from its arithmetic mean 
are expressed in units of standard deviation — gives 
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of the X and of the Y series 


2a; ^ 


The slope — m — of the 


line of regression (also of least squares) is where r is the 

coefficient of correlation, ay the standard deviation of the F 
series, and a^ the standard deviation of the X series, (In time 

series the X series represents time). But, for m reduces 


to 


2a;^ 

2a;2 


Therefore, the line of best fit — least squares — ^may 


be thought of as the regression line of Y (the items) on X (the 
time) . 

Table 75 contains the average monthly totals, the F series, 
and the years, the X series, from which the slope of the line 
in Figure 83 is derived. 

The middle point — ^time — in X is halfway between Decem- 
ber, 1909, and January, 1910, The middle amount correspond- 
29,105 


ing to this time is 
43,359 


14 


2078.9. The annual increment is 

95.3 


910 


95.3. The monthly increment is, therefore, 


12 


= 7.9. The annual increment is the amount by which the 
trend line — see Figure 83 — rises from year to year, and the 
monthly increment, the amount by which it rises from month 
to month. 

Now from the slope — m = 7.9 monthly increment — ^it is only 
necessary to find the ordinates of the trend. This is done 
as follows: The middle of the period, 1903-1916, is halfway 
between December, 1909, and January, 1910. The middle 
amount corresponding to this period is 2078.9. Accordingly, 
to get the ordinate for December, 1909, it is necessary to sub- 


* r =s Accordingly, — 

mx<^y 

cordingly, ^ 




But 


=i/^. 

* n 


Ac- 
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TABLE 75 

Monthly Production of Pig Iron 1903-1916 
(OOO^s tons) 

(Showing Method of Determining Monthly Increment of Trend) 


1 

2 

3 

4 

5 

Years 

X 

Production 

Monthly 

y 

Deviation 

IN Series * 

3i 

Deviations in 
Series X Squared 

3,2 

xy 

1903 

1452 

— 13 

169 

— 18876 

1904 

1344 

— 11 

121 

— 14784 

1905 

1882 

— 9 

81 

— 16938 

1906 

2066 


49 

— 14482 

1907 

2109 


25 

— 10545 

1908 

1302 


9 

— 3906 

1909 

2116 


1 

— 2116 

1910 

2237 


1 

2237 

1911 

1944 

3 

9 

5832 

1912 

2448 

5 

25 

12240 

1913 

2560 

7 

49 

17920 

1914 

1 1921 

9 

81 

17239 

191^ 

2472 

11 

121 

27192 

1916 

3252 

13 

169 

42276 

Total 

29105 

0 

M 

II 

CO 

O 

5:a:j/=43359 


* In order to avoid fractions, since the deviations are taken from the 
middle of 1909-1910, whole numbers are used, and the 2®* — ^910--later 
divided by 2. 


tract one half of the monthly increment — ^that is, 


7.9 


:4— 


from 2078.9, which gives 2074.9, or 2075 in round numbers. 
Then with December, 1909, as a starting point subtract suc- 
cessively the annual increments to get the December ordinates 
of trend for the previous years, and add them successively to 
get the December ordinates of trend for the following years.^ 


^ Of course the trend line can be plotted from any two ordinates as thus 
determined and the other amounts read directly from the ordinate scale. 
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It is in this manner that the line of trend in Figure 83 is 
determined.^ The actual amounts are shown in column 4 of 
Table 77. 

The line of trend according to the least-square assumption 
is a “best fit’^ only for the period to which it applies. The 
addition of other years or the elimination of some already 
taken may radically change its position. Moreover, this line 
can rarely be extended to cover future years because nothing 
is known about the condition these years will bring. There 
is no method of stating, as there is in frequency series, the 
probable dispersion of additional data. As Persons well says: 

'The method of curve-fitting is superior to the method of moving 
averages for measuring secular trend. The determination of a curve 
or line which pictures the secular trend of a past period, does not 
determine present or future trend. The presumption that past trend 
will continue is strong in some cases and weak in others. The 
estimate of future trend should be influenced by recent tendencies 
and current items to some degree, yet we should not lightly conclude 
from short-time fluctuations that secular trend has changed. . . . 
The extension of a past trend is a prophecy. It is impossible to get 
away from that fact. The important thing is that the exact nature 
of the prophecy be made unmistakable.'' ’ 


The trend is eliminated from the actual items month by 
month by expressing the items as percentages of the trend. 
That is, the trend is taken as a base from which the actual 
items appear as plus (+) or minus (— ) deviations. The per- 
centage relations of the items to trend are shown in column 
5 of Table 77, and illustrated by the heavy line in Figure 84. 
This line shows the production (as percentages) corrected for 
long-time trend, 

* For a description of another method of determining the annual incre- 
ment of trend for a straight line which gives the same result as the 
method of “least squares,’’ see Frickey, Edwin, “The Line of Secular 
Trend,” The Review of Economic Statistics, April, 1919, pp. 210-211. 

* Persons, W. M., The Review of Economic Statistics, January, 1919, 

p. 18. 
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FIGURE 84 

Pig Iron Production 1903-1916 — ^Figures Corrected for Long- 
Time Trend (Percentages) * 

140 
120 
100 
so 
60 

^ Reproduced by courtesy of tbe Editors of tbe Review of Economic 
Statistics, Harvard Committee on Economic Research, Cambridge, Mass. 
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2. METHODS OF MEASURING NORMAL SEASONAL CHANGE 

Before measuring seasonal changes, the fact that they exist 
must first be determined. It is apparent that it is useless to 
expect a perfect repetition year after year of seasonal swings. 
Variation characterizes our industrial and social world as it 
does such pure chance phenomena as dice throws, for instance. 
Having noted the fact of seasonal change — ^which may be done 
from a graphic representation of data — ^the problem is to 
secure some measure of the normal or characteristic changes 
which tend to be repeated year after year. To do this some 
form of averaging — ^that is, of reducing detail and varia- 
tion to type — ^must be used. But different methods 
give different results. Which are most satisfactory and 
why?^ 

^The Uterature on this subject is extensive, and new methods and dis- 
cussions and criticisms of old ones are constantly appearing. All that 
can be done in a textbook is to describe briefly the more important 
methods, and refer students to more detailed and elaborate treatments of 
the subject. See, for instance, Persons, W. M., *Undices of Business 
Conditions,” Review of Economic Statistics, Cambridge, Mass,, Jan., 
1919, pp. 18-81; King, W. I., “An Improved Method for Measuring the 
Seasonal Factor” in Journal of the American Statistical Association, 
September, 1924, pp. 301-313 ; Palkner, H. B., “The Measurement of 
Seasonal Variation,” Journal of the American Statistical Association, 
June, 1924, pp. 167-179 and the literature there referred to. 
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(1) Monthly Means or Averages 

If data by months are available over a series of years, and 
it is desired to get a measure of the normal seasonal varia- 
tion in the items, the simplest method would appear to be to 
take an average of some sort of the amounts of the Januaries, 
the Februaries, etc., and to express them as percentages of 
their own average. But such a method makes no allowance 
for the long-time trend, for cyclical movements, nor for acci- 
dental disturbances. Moreover, the use of the arithmetic 
mean gives prominence to the exceptional items, and this is 
not desired, since what is sought is a picture of the normal 
seasonal change. This method has little to commend it except 
the ease with which it may be carried out.^ 

(2) The Method of Moving Medians 

King 2 has recently suggested a method of measuring the 
seasonal factor which seems to have considerable merit. The 
steps in its use are as follows: 

a. Plot the original monthly data of the series to be studied. 

b. Draw a free-hand curve through the cycles representing 
as nearly as can be what the data would be if there were no 
seasonal changes, 

c. Read from the curve drawn in ^'b^' the figures each month 
representing the tentative estimate of the cycle amounts. 

d. Divide each of the monthly amounts in ‘^a” by those 
secured in 

e. Take moving medians (King used one covering nine 
periods) of the percentages for the Januaries, for the Febru- 
aries, etc., and plot them to the middle year of the period. 

f. Adjust the percentages for the months in each year so 
that their sum equals twelve. 

^ See Davies, G. R., Introduction to Economic Statistics, Century Co., 
New York, 1922, pp. 116-120 for a discussion of this method, and for a 
modification of it which eliminates many of its weaknesses. 

* King, W. I., “An Improved Method of Measuring the Seasonal Factor’^ 
in Journal of the American Statistical Association, September, 1924, pp. 
301-313. 
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King holds that this method is superior to others because (1) 
it is easy to understand, (2) can be computed easily, and 
(3) gives a separate seasonal index for each year during the 
period treated.^ 


(5) The Median-Link-Relative Method^ 


The median-link-relative method of measuring normal sea- 
sonal change makes use of an average — ^the median — and 
monthly relative numbers calculated on a shifting base. The 
steps in its use are as follows: 

a. From the original monthly items calculate relative or 
percentage numbers for each month by dividing the amount 
for each month by the amount for the preceding month and 

multiplying the result by 100, For instance, X 100 


gives the January relative; X 100, gives the Feb- 


ruary relative; 


March 


January 

X 100, gives the March relative; 


February 

and so on through the entire series. 

b. Arrange the relative numbers in the form of a frequency 

. r 1 r u • r January February , 

table for each pair of months, as etc, 

December^ January^ 

There will then be as many frequencies for each pair as there 
are years in the period covered. In the case of pig iron, since 
the years 1903 to 1916, inclusive, are used there are fourteen. 
A frequency table arrangement shows the dispersion of the 
relatives and helps one to decide whether to take a median 
of all of the items or an average of those near the median. 
The relatives for pig iron are shown in tabular form in Table 
76. 


^For a discussion of the steps which involve a certain amount of dis- 
cretion, see King, op. cit^ passim. 

"This method was devised by Professor W. M. Persons and is now 
extensively ust:d. See his discussion of it in comparison with other 
methods in “Indices of Business Conditions.'^ Review of Economic Sta- 
tistics ^ Cambridge, Mass., Jan., 1919, pp. 18-31. 
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c. Inasmuch as a characteristic picture of the dispersion 
of the relatives is desired, an average least affected by ex- 
tremes should be used to secure it. Modes would be ideal, 
but since they are not rigidly defined — indeed, there may be 
no modes for the series in question — ^the median of the rela- 
tives for each pair of compared months seems most appro- 
priate. The medians for pig iron production are shown at the 
bottom of Table 76.^ 


TABLE 76 

Table Showing Monthly Link Relatives of Pig Ikon Production 
1903 TO 1916 


Years 

Jan 

Feb. 

Mar 

Apr. 

May 

June 

July 

Aug. 

Sept 

Oct. 

Nov. 

Dec 

Jan 

Deo 

Jan. 

Feb. 

Mar. 

Apr 

May 

June 

July 

Aug. 

Sept 

Oct. 

Nov. 

Dec 

1903 

1904 

1905 

1906 

1907 

1908 

1909 

1910 

1911 

1912 

1913 

1914 

1915 

1916 

1917 

94 

109 

110 
101 

99 

85 

103 

99 

99 

101 

100 

95 
106 
100 

99 

94 
131 

90 

92 

92 

103 

95 

92 

101 

102 

92 

100 

105 

97 


101 

107 

99 

96 
100 

94 

95 

95 

94 

99 

100 

97 
102 

97 

106 

99 

102 

101 

104 

101 

108 

96 

92 

106 

103 

92 

107 

104 

98 

84 

91 
94 
97 

94 
103 

95 

95 

97 

93 

92 
105 

96 


102 

106 

106 

96 

100 

112 

107 

98 

107 
104 
100 
102 

108 

99 

pP 


73 

102 

98 

100 

78 

101 

98 

91 

95 

98 

88 

85 

97 

94 

81 

109 
102 
102 

67 

110 
103 

93 

102 

106 

89 

100 

105 

96 


Medians 




98 0 

102.5 

95 0 


103 0 

E&EB 

B 

96 0 



Chain 

Relatives 

B 

H 

109 4 

107 2 

109 9 

104 4 

104 4 

107.5 

108 6 

116 8 

112 1 



Adjusted 

^1 


107.0 

103 7 

105 1 

98 8 

97 7 

99 5 

99 3 

105 6 

100.2 

1011 

100 0 

Seasonal 

Indices 

98 9 

93.9 

105 9 

102 6 


97 7 

96.6 

98 4 

98 3 

104 5 

99 2 

100.0 



d. “Chain” the median relatives; that is, successively mul- 
tiply them together. The amount for is taken as 


100 and multiplied by the median for > (96.0) . This 

^ In some instances, tlie average of the middle three, or of the middle 
four items is taken rather than the median item. If no seasonal move- 
ment is apparent from the frequency groups, one may sometimes be devel- 
oped by widening the groups. 
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gives the chain relative of 


February 
January * 


This number is then 


multiplied by the median, which gives the 

chain relative. This process is continued until chain relatives 
for all of the months are secured. The last multiplication 
gives the chain relative for December. If there is no secular 

or long-time trend, the amount for the last ^ will be 

the same as for the first . If the trend is downward, 

I \ r\ §-\ r\^ * 


December 


the last item will be less than the first; if it is upward, it will 
be more. The method of treating this deficit or excess (as 
in the case of pig iron) — see ^^chain relatives” bottom of Table 
76 — is described in the paragraph immediately following. 

e. Since the medians and chain relatives are taken as typical 
of the entire period, the excess, 14.3 per cent, may be regarded 
as the average trend. This must be distributed over the 12 
monthly relatives. Since the chain relatives were secured by 
successively multiplying together the medians, any error in 
seasonal change due to the trend is cumulated from month to 
month during the year. Accordingly, the excess must be 
spread over the different months. This may be done arith- 
metically or geometrically, the latter basis being used to secure 
the “adjusted relatives’^ in Table 76.^ 


a If the error in the median link relatives is d and the new January 
chain relative is A (expressed as a decimal — in this case 1.143) then 

The value of the amount to be distributed may be found from this equa- 
tion by the use of logarithms. ' The January chain relative is unaffected 
by this adjustment. The one for February is divided by (1 + d) ; the 
one for March, by (1 + d)*; the one for April by (i + d)*; and so on, 
the one for December being divided by (1 -f- d)“. The new January is 
100 — that is, its excess has been distributed geometrically over the pre- 
ceding eleven months. 

Arithmetically, of the discrepancy should ))e deducted from the 
2 

January relative ; from the February relative, and so on giving 100 
as the relative for the new January on December. 
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f. The adjusted relatives secured in step '^e^' are in terms 
of January as a base. The last step is to express them in 
terms of the average for the year as a base. This is done 

by dividing each of them by ^ of the total of the twelve 

items and multiplying by 100. In this form they are given 
in the last line of Table 76. These are the adjusted monthly 
indexes of seasonal variation. These indexes are plotted as 
the broken line above and below the base 100 in Figure 84. 
They are inserted in column 6 of Table 77. 

The seasonal variation may be eliminated from a series by 
subtracting the seasonal indexes month by month each year 
from the percentage ratios of the actual items to the ordinates 
of trend. (See Table 77 column 7 which contains the differ- 
ences taken to the nearest per cent.) A graphic representa- 
tion of the original data of pig iron production, after they are 
corrected for both long-time trend and seasonal variation, is 
shown in Figure 85. The line in this chart — ^plotted as per- 
centage deviations from a zero or no change line — ^therefore, 
represents the cyclical changes (plus the accidental variations) 
m this series. Technically, it is known as the 'Xine of Per- 


FIGURE 85 

Pig Iron Production— Percentages— Corrected for both Secular 
Trend and Seasonal Variation * 
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♦ Keproduced by tbe courtesy of tlie Editors of tbe Review of Ecommio 
Statisticsy Harvard Committee on Economic Research, Cambridge, 
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454 STATISTICS AND STATISTICAL METHODS 


TABLE 77 

Table Showing Actual Pig Iron Production, Least-Square 
Ordinates of Trend, Seasonal Variation and Cycle Percent- 
ages, 1903 to 1916; and Cycle Percentages op Interest Bates 
on 60-90 Day Commercial Paper, New York, 1903-1916 


1 

2 

3 

4 j 

1 

5 

6 

n 

8 

9 

Year 

Month 

Pig Iron Production 

Cycle Per 
Cents of 
Interest 
Rates on 
60-90 Com- 
mercial 
Paper, 
New York, 
1903-1916 

Produc- 
tion 1 
(OOO’s of 
tons) 

Trend 
(OOO’s of 
tons) 

Per Cent 
of Trend 
3-7-4, % 

Seasonal 

Variation 

% 

Cyclical 
Yariations 
5—6 , % 

Cycle 
Per Cents 
7 -r (T 
(19 1) 


Jan. 

1472 

1416 

104.0 

98.9 

5 

.3 

— .1 


Feb. 

1390 

1424 

97.6 

93.9 

4 

.2 

.1 


Mar. 

1590 

1432 

111.0 

105.9 

5 

.3 

,5 


Apr. 

1608 

1440 

1117 

102 6 j 

9 

.5 

.2 


May 

1713 

1448 

118.3 

104.0 

14 

.7 

— .1 

!;)()3 

June 

1673 

1456 

114.9 

97.7 

17 

.9 

.5 


July 

1546 

1463 

105.7 

96.6 

9 

.5 

.4 


Aug. 

1571 

1471 

106.8 

98 4 

8 

.5 

.5 


Sept. ^ 

1553 

1479 

105.0 

98.3 

7 

.4 

.3 


Oct 

1425 

1487 

95.8 

104 5 

— 9 

— .5 

— .1 


Nov. 

1039 

1495 

69.5 

99.2 

— 30 

— 1.6 

.3 


Dec. 

846 

1503 

56.3 

100.0 

— 44 

-23 

.0 


Jan. 

921 

1511 

61.0 

98.9 

— 38 

— 2.0 

— .3 


Feb. 

1205 

1519 

79.3 

93.9 

— 15 

— 8 

,0 


Mar. 

1447 

1527 

948 

105 9 

— 11 

— .6 

— .3 


Apr. 

1555 

1535 

1013 

102.6 

— 1 

— .1 

— ,7 


May 

1534 

1543 

99.4 

104.0 

— 5 

i ~ 

— .8 

1904 

June 

1292 

1551 

83.3 

97.7 

— 14 

' - .8 

— 1.0 


July 

1106 

1559 

70.9 

96.6 

— 26 

— 1.4 

— 1.3 


Aug. 

1167 

1567 

74.5 

984 

— 24 

— 1.3 

— 1,5 


Sept. 

1352 

1575 

85.8 

98.3 

— 12 

— .6 

— 1.4 


Oct. 

1450 

1583 

91.6 

104.5 

— 13 

— .7 

— 1.3 


Nov. 

1486 

1591 

93.4 

99.2 

— 6 

— .3 

— 1.4 


Dec. 

1616 

1598 

101.1 



.1 

— 14 
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centage Deviations of Original Items from Secular Trend 
Corrected for Seasonal Variation/’ ^ 

3. CYCLICAJL FLUCTUATIONS 

The original data, although corrected for trend and seasonal 
variations as shown in Figure 85, still contain the fluctuations 
which are due to accidental and fortuitous circumstances. 
While in a particular case, some satisfactory method might be 
determined for measuring and removing them too, it is use- 
less to attempt to derive a method which will be generally 
applicable/ Accordingly, cycle percentages, determined in the 
manner discussed above, represent true cycles only when they 
do not contain accidental fluctuations. 

The cyclical fluctuations of time series differ in two major 
respects: (1) in the amplitude or extent of the variations, and 
(2) in the time of their occurrence. If the cycles in two 
series are to be compared, therefore, both of these differences 
must be taken into account. The ways in which this is done 
are of interest. 

The percentage deviations of cyclical fluctuations in two 
or more time series may be reduced to a comparable basis 
by dividing them item by item by the standard deviation of 
the series to which they belong. This measure of dispersion 
reduces them to a common denominator in the same way 
that it does the deviations of items from their respective 
averages.® Such percentages, called ^^cycles,” may then be 
plotted on a common scale in units of standard deviations 
When this is done, the extent or degree of fluctuation through- 

3 An expression found in the writings of Professor Persons, who worked 
out the above method, and employed in the various studies of the Harvard 
Committee on Economic Research, Cambridge, Mass. 

* See Persons, W. M., “An Index of General Business Conditions,” The 
Beview of Economic Statistics, April, 1919, pp. 137-138, wherein a method 
of isolating the irregular fluctuations for the value of building permits, 
1903-1916, is worked out. 

* See the discussion of the coefficient of dispersion based on the standard 
deviation, supra, p. 355. 



1 

2 

i 

3 

mm 

5 

6 

7 

8 

9 

Ybjle 

Mouth 

PxG Iron Production 

Cycle Per 
Cents op 
Interest 
Rates on 
60-90 Com- 
mercial 
Paper, 
New York, 
1903-1916 

Produc- 

tion 

(OOO’s of 
tons) 

Trend 
(OOO’s of 
tons) 

Per Cent 
of Trend 
3-4, % 

Seasonal 

Variation 

% 

Cyclical 
Variations 
5—6 ,% 

Cycle 
Per Cents 
7 — cr 
(19 1) 


Jan. 

1781 

1606 


98.9 

12 

.6 

— 1.1 


Feb. 

1597 

1614 

98 9 

93 9 

5 

.3 

— .9 


Mar. 

1936 

1622 

119.4 

105 9 

13 

.7 

— 1.0 


Apr. 

1922 

1630 

117 9 

102 6 

15 

.8 

— .8 


May 

1963 

1638 

118.2 

104.0 

16 

.8 

— .7 

1905 

June 

1793 

1646 

1089 

97.7 

11 

.6 

— .8 


July 

1741 

1654 

105.3 

96.6 

9 

.5 

— .7 


Aug. 

1843 

1662 

110.9 

98.4 

13 

.7 

— 1.0 


Sept. 

1899 

1670 

113.7 

98.3 

15 

.8 

— .9 


Oct. 

2053 

1678 

122.3 

104.5 

18 

.9 

— .8 


Nov, 

2014 

1686 

121.1 

99 2 

20 

1.0 

.1 


Dec. 

2045 

1694 

120.7 

100 0 

21 

1.1 

.2 


Jan. 

2068 

1702 

121.5 

98 9 

23 

1,2 

.1 


Feb. 

1904 

1710 

111.3 

93 9 

17 

.9 

.4 


Mar. 

2155 

1718 

125.4 

105.9 

20 

10 

.5 


Apr. 

2073 

1726 

120.1 

102 6 

18 

.9 

.8 


May 

2098 

1733 

121.1 

104 0 

17 

.9 

.8 

1906 

June 

1976 

1741 

113.5 

97.7 

16 

,8 

.8 


July 

2013 

1749 

115.1 

96 6 

18 

.9 

.8 


Aug. 

1926 

1757 

109.6 

98.4 

11 

.6 

.9 


Sept. 

1960 

1765 

111.0 

98 3 

13 

.7 

1.1 


Oct. 

2196 

1773 

123.9 

104 5 

19 

1.0 

.8 


Nov. 

2187 

1781 

122.8 

99 2 

24 

1.3 

.9 


Dec, 

2235 

1789 

124.9 

100.0 

25 

1.3 

.8 


Jan. 

2205 

1797 

122.7 

98.9 

24 

1.3 

1,3 


Feb. 

2045 

1805 

113.3 

93.9 

19 

1.0 

1.4 


Mar. 

2226 

1813 

122 8 

105.9 

17 

.9 

1.5 


Apr. 

2216 

1821 

121.7 

102.6 

19 

1.0 

1.3 


May 

2295 

1829 

125 5 

104 0 

21 

1.1 

.9 

1907 

June 

2234 

1837 

121.6 

97.7 

24 

1.3 

1.2 


July 

2255 

1845 

122.2 

96.6 

26 

1.4 

1.1 


Aug. 

2250 

1853 

121.4 

98.4 

23 

1.2 

1.3 


Sept. 

2183 

1861 

117.3 

98.3 

19 

1.0 

1.5 


Oct. 

2336 

1868 

125.1 

104.5 

20 

1.1 

1.7 


Nov. 

1828 

1876 

97.4 

99.2 

— 2 

— .1 

2.2 


Dec. 

1234 

1884 

65.5 

100.0 

— 34 

— 1.8 

2.7 
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out the different phases of the cycle become directly com- 
parable. 

The cycle percentages for pig iron production, 1903 to 1916, 
are found in column 8 of Table 77. In this case each of the 
percentages in column 7 has been divided by 19.1 — the 
standard deviation for this series. 

But the ^Timing” of cyclical fluctuations in different series 
varies. If it is desired to compare both the amplitude and con- 
gruence of change then the method of correlation must be used. 
A discussion of this phase of the subject immediately follows. 

IV. The CoRRMiATioN OF Time Series 

The distinction between correlation and narrow causation 
was fully developed in Chapter XIII. Nothing further needs 
to be said about it here except again to call attention to the 
fact that comparisons generally involve some idea of estab- 
lishing causation or correlation. Now, the characteristic thing 
about time series is that the items are ^^ordered in time,” 
to use Professor Persons’ phrase. The relations of the items 
one to the other as thus ordered are due, among other things, 
to long-time and short-time influences of a variety of types. 
Accordingly, if the degree of association between historical 
series is the object sought by comparison, it is useless to corre- 
late them until, so far as is possible, the different types of 
fluctuations have been isolated. ^Tt is of little avail (or 
actually misleading) to compute the coefficient of correlation 
from pairs of actual items. In case the two series possess 
definite trends, or seasonal variation the coefficient of correla- 
tion for the items will yield a value different from zero. Hav- 
ing found such a coefficient we would be unable to say what 
contributed most largely to the result — similar (or diverse) 
trends, seasonal variations, cyclical movements, or irregular 
fluctuations.” ^ 

* Persons, W. M., “Correlation of Time Series” in ManihooTc of Mathe- 
matical Statistics, H. lx Rietz, Editor in Chief, Houghton Mifflin, Bos- 
ton, 1924, pp. 150-151. 
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Bowley has stated the same thought as follows: 

"If we take two things which are absolutely disconnected, except 
that they are both phenomena arising m the progress of society, and 
work out the coefficient by the straightforward rule, we shall find 
there is some correlation. If two curves have short fluctuations 
which are correlated, but opposite symptoms, then owing to the 
symptom apart from the fluctuations there would be negative corre- 
lation, while owing to the fluctuations apart from the symptom there 
would be positive correlation; and when both are taken into account 
the correlation may be positive, zero, or negative.^^ ^ 

If this is true, then correlation or association is best meas- 
ured by using series from which both the trend and the 
seasonal variation have been eliminated. The cycle percen- 
tages are distinctly less ^^ordered in time^^ than are the original 
items. Series relating to business and economic phenomena 
are much more alike in their cyclical relations alone than 
they are in all of their fluctuations. Their trends and seasonal 
variations are peculiar to themselves; their cyclical fluctua- 
tions are the results of underlying business conditions affect- 
ing industry and trade generally. 

Two methods are available for correlating the cyclical 
variations of two or more series; (1) the graphic method, and 
(2) the use of the Pearsonian coefficient. The graphic method 
indicates the fact of correlation, but it does not measure it. 
Pearson^s r does both. Moreover, the graphic method of super- 
imposing one ^^corrected^^ series over the other roughly indicates 
the appropriate period of lag which will give the highest degree 
of correlation. It does not, however, measure the correlation 
for different “timings ” This is done only by the use of the 
numerical measure of correlation — Pearson^s r. How is this 
measure applied to “cycle percentages”? 

The different steps in correcting original items for secular 
trend and seasonal variation, as outlined above for pig iron 
production, give a series of percentages. In order to make 
them comparable, it has been found to be appropriate to divide 

2 Bowley, A. L., Measurement of Groups and Series^ Layton, London, 
1903, p. 83 



l 

2 

3 

4 

5 

6 

7 

8 

9 

Year 

Month 

Pig Iron PEODtrcTiour 

Cycle Per 
Cents op 
Interest 
Rates on 
60-90 Com- 
mercial 
Paper, 
New York, 
1903-1916 

Produc- 

tion 

(OOO's of 
tons) 

Trend 
(OOO’s of 
tons) 

Per Cent 
of Trend 
3 - i - 4 , ^0 

Seasonal 

Variation 

% 

Cyclical 
Variations 
5—6 ; % 

Cycle 
Per Cents 
7-T-<r 
(19 1) 


Jan. 

1759 

2178 

80.8 

98.9 

— 18 

— .9 

— 6 ' 


Feb. 

1794 

2186 

82.1 

93.9 

— 12 

— .6 

— .2 


Mar. 

2188 

2194 

99.7 

105.9 

— 6 

— .3 

— .6 


Apr, 

2065 

2202 

93 8 

102.6 

— 9 

— .5 

— .7 


May 

1893 

2210 

85.7 

104.0 

— 18 

— .9 

— .6 

1911 

June 

1787 

2218 

80.6 

97.7 

— 17 

— .9 

— .4 


July 

1793 

2226 

80 5 

96.6 

— 16 

— .8 

— .6 


Aug. 

1926 

2234 

86.2 

98.4 

— 12 

— .6 

— .6 


Sept. 

1977 

2242 

88.2 

98.3 

— 10 

— .5 

— .5 


Oct. 

2102 

2250 

93.4 

104.5 

—11 

— .6 

— .8 


Nov. 

1999 

2258 

88.5 

99.2 

— 11 

— .6 

— 1.1 



2043 

2266 

90.2 

100.0 

— 10 

— .5 

— .5 


Jan. 


2273 

90.5 

98 9 

— 8 

— .5 

— .6 


Feb. 


2281 

92.1 


— 2 

— .1 

— .4 


Mar. 


2289 

105.1 

105.9 

— 1 

— .1 

— .1 


Apr. 

2375 

2297 

103 4 

102 6 

1 

.1 

— .1 


May 

2512 

2305 

109 0 

104.0 

5 

.3 

.1 

1912 

June 

2440 

2313 

105 5 

97.7 

8 

.4 

.1 


July 

2410 

2321 

103 8 

96.6 

7 

.4 

.4 


Aug. 

2512 

2329 

1079 

98.4 

9 

.5 

.5 


Sept. 

2463 

2337 

1054 

98.3 

7 

.4 

.8 


Oct. 

2689 

2345 

WEM 

104 5 

10 

.5 

1.1 


Nov. 

2630 

2353 

IgH 

99.2 

13 

.7 

1.1 


Dec. 

2782 

2361 

117.8 


18 

.9 

1.3 


Jan. 


2369 


98.9 

19 

10 

.7 


Feb. 


2377 

108.8 

93.9 

15 

.8 

1.0 


Mar. 


2385 

115 8 

105 9 

10 

.5 

1.8 


Apr. 


2393 

115.0 

102.6 

12 

.7 

1.6 


May 

2822 


1175 

104.0 

14 

.7 

1.5 

1913 

June 

2628 


109.1 

97.7 

11 

.6 

2.3 


July 


2416 

106.0 

96.6 

9 

.5 

2.2 


Aug. 


2424 

104.9 

98.4 

6 

.4 

1.7 


Sept. 


2432 

103.0 

98.3 

5 

.3 

1.2 


Oct. 

2546 

2440 

104.4 

104.5 

0 

.0 

1.0 


Nov. 

2233 

2448 

91.2 

99.2 

— 8 

— .4 

1.0 


Dec. 

1983 

2456 

80.7 

100.0 

— 19 

— 1.0 

1.0 
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them by the standard deviation of the series to which they 
belong. In this form, they are multiples of this common 
divisor. Accordingly, to correlate them with another series 
similarly corrected it is necessary only to multiply together 
the corresponding deviations in the two series, algebraically 
sum or total the products and divide by the number of paired 
items involved. This follows because (1) in each of two series 
the algebraic sum of the deviations from the line of secular 
trend equals or closely approximates zero,^ and (2) the cycle 
percentages are themselves expressed in units of standard 

deviations. Accordingly, the formula, r = for original 

U <Ti CTg 

data, becomes for cycle percentages. 

The cycle percentages for interest rates on 60-90 day com- 
mercial paper in New York - are shown in Table 77 column 9. 
If these two series are correlated by pairing corresponding 
months — that is, by multiplying the (.3) for January, 1903, 
pig iron production in column 8 of Table 77 by the (—.1) for 
January, 1903, 60-90 day interest rate, in column 9; the 
February (.2) by the February (.1) ; and so on for the remain- 
der of the months during 1903 to 1916 — the correlation co- 
efficient r is found to be + .109. If coefficients are worked out 
with interest rates lagged after pig iron production, different 
results will be secured. If interest rates are lagged 4 months 
— ^that is, if May, 1903, interest rate cycles are paired with 
January, 1903, pig iron production cycles, June with February 
and so on — ^the correlation is + .50. Successive lagging of 
interest rates gives the following coefficients: 5 months, -f- -52; 
6 months, + .57; 7 months, + .58; 8 months, + .57; 9 months, 
4- .57; 10 months, + .55. Accordingly, maximum correlation 

^The actual deviations will always equal zero, and the percentages 
closely approximate it in most cases. 

*Data are taken from the Review of Mconomic Statistics, January, 
1919, p. 122. They are secured in the same manner as the correspond- 
ing data for pig iron production. 





Produc- pgj. 

(000*s of of Trend 
tons) 3^4;% 


2464 
2472 
2480 

2488 ix 91 2 I 102.6 




1516 

2551 

J EE< 

1601 

2 

559 

Feb. 

1675 

2 

567 

Mar. 

2064 

2 

575 

Apr. 

2116 

2583 

May 

2263 

2591 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

2381 

2563 

2780 

2853 

3125 

3037 

3203 

2 

2 

2 

2 

2 

2 

2 

599 

607 

615 

623 

631 

639 

647 




Cycle 
Per Cents 
7 — <T 
(19.1) 


Eates on 
60-90 Com- 
mercial 
Paper, 
New York, 
1903-1916 



462 
































TREATMENT OF TIME SERIES 463 

occurs when interest rates are lagged seven months after pig 
iron production. This is the time interval (in monthly units) 
of '^best fit'^ between cycles of interest rates on 60-90 day com- 
mercial paper and cycles of production of pig iron for the 
period 1903 to 1916. 

But different correlation coefficients would be secured if a 
different period of time — as for instance 1903 to 1914 — ^were 
used.^ Indeed, the size of the coefficient is of value for de- 
termining not only the best fitting lag but also the best 
fitting total period for which to correlate the cycle percent- 
ages. 

Moreover, the coefficients of correlation of cycle percent- 
ages’ of a great number of time series may be used as a 
basis for selecting those which lag behind or precede other 
series. It was by their use that Professor Persons originally 
constructed from the annual data of a large number of sta- 
tistical series both a business' barometer and a forecaster.® 
The same method, elaborated and refined, when applied to 
data for the pre-war period, 1903 to 1914, laid the founda- 
tion for the present business barometric and forecasting lines 
of the Index of General Business Conditions now currently 
issued by the Harvard Committee on Economic Research, and 
described later.* 

^For the coeflSicieiits for different periods of lag, see Persons, W. 
M., "'Correlation of Time Series’' in Rietz, H. L. (Editor in Chief) 
Handbook of Mathematical StatisticSj Houghton Mifflin, 1924, pp. 162- 
163. 

^Persons, W. M., “Construction of a Business Barometer Based Upon 
Annual Data,” American Econormo Review, December, 1916, pp. 739- 
769. 

® See infra, pp. 538-541. For a complete explanation of the method see 
Persons, W. M., “Indices of Business Conditions,” Review of Economic 
Statistics, January and April, 1919, passim; Persons, W. M., "‘A Non- 
Technical Explanation of the Index of General Business Conditions,” 
Review of Ecotwmic Statistics, February, 1920, pp. 39-48 ; “The Harvard 
Index of General Business Conditions — Its Interpretation,” Harvard 
Committee on Economic Research, 1923 (published separately) ; “The 
Revised Index of General Business Conditions,” Review of Economic 
Statistics, July, 1923, pp. 187-195. 
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V. The Probable Error of the Correlation Coefficient 
OP Time Series^ 

Having computed the correlation coefficient for two series of 
random samples on the assumptions (1) that forces are at 
work in each of them tending to produce normal distributions, 
and (2) that these forces are not independent of each other/ 
the probable error is computed in keeping with the theory of 
error typical of such distributions.^ May the significance of 
correlation coefficients in time series be tested in the same 
manner? The answer must be sought in an analysis of how 
completely if at all the foregoing assumptions hold for such 
series. 

As was noted above, time series are ordered in time, that 
is, each successive item holds its position in relation to the 
others, a succession of items of similar size tending to be the 
rule rather than the exception. In non-time or condition (at- 
tribute) series the order of the items has no significance. 
Moreover, in time series, random selection does not hold for 
the period of time for which trends, seasonal variations, and 
cyclical changes are determined. In fact a specific period is 
selected by design, care being taken to omit years which are 
exceptional — as for instance those during wars. The omission 
or inclusion of a year or of years may alter not only the trend 
but also the variations from trend for which characteristic 
pictures are being sought. The case is different with non-time 
series, the intent being to select at random as large a propor- 
tion of the population as is possible. 

It is apparent, therefore, that probable errors computed for 

' See the discussion of this subject by Professor Persons in the Review 
of Economic Statistics, April, 1919, pp. 124-127 ; “Correlation of Time 
Series” in Handbook of Mathematical Statistics, Houghton Mifflin, Bos- 
ton, 1924, pp. 150-165 at pp. 162-163; “Some Fundamental Concepts of 
Statistics,” Journal of the American Statistical Association, March, 1924, 
pp. 1-8, at pp. 6-8» 

* See the discussion, pp. 406-410. 

® See the discussion, pp. 428-429. 
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coefficients of correlation between time series, even though 
the latter are corrected for trend, and for seasonal and cyclical 
variations, do not have a probability meaning. As Professor 
Persons says: 

'Thus, the 'probable error' of 0,03 in a coefficient of correlation of 
-1-0.75 between the monthly items of pig-iron production and money 
rates six months later does not indicate, as one would conclude from 
the theory of probability, that the chances are billions to one against 
the independence of the two variables; or, to state the idea more 
specifically, that if we compute a coefficient from data of 'any' other 
actual period the chances are more than ten millions to one that its 
value would be over + 0.50. In fact, the significance of the 'probable 
error' of a constant computed from time series is not known, and, in 
practice, we do not view the world from the standpoint of mathe- 
matical probability. So that we are not surprised when we actually 
find that the coefficient of correlation between the adjusted figures 
for pig-iron production and money rates six months later for the 
period 1915-1918 is only -f-O.SS. We find sufficient explanation of 
this result, which is almost impossible and really astounding when 
viewed from the standpoint of random sampling, in the war demands 
for pig-iron, the tremendous imports of gold, government financing, 
and the inauguration of the Federal reserve system during the period 
in question. Neither are we surprised when we find that for the 
period 1919-1923 the maximum correlation between the two series 
is for a lag in money rates, not of six months, but of nine to twelve 
months. For this period includes the severe crisis and great financial 
stringency of 1920-1921, which dominated most of the items and 
hence the results. Thus in actual practice the statistician cannot 
reasonably assume ignorance of the peculiar circumstances pertaining 
to the special cases which constitute his material, and therefore he 
does not think in terms of random sampling and numerical probabil- 
ities. Granting as one must that consecutive items of a statistical 
time series are, in fact, related makes inapplicable the mathematical 
theory of probability."^ 


VI. Conclusion 

The treatment and correlation of time series involve the 
use of special statistical methods in many respects different 

3 Persons, W. M., “Some Fundamental Concepts of Statistics,” Jourml 
of The American Statistical Association, Marck, 1924, p. 7. 
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from those commonly applied to other types of data. These 
have to do with (1) the determination of long-time trends 
and short-time variations of different types; (2) their isola- 
tion; (3) the correction of original data for those influences; 
and (4) the correlation of the '^corrected” series. 

The technique of analysis, briefly described in this chapter, 
while developed for the most part in connection with the study 
of the business cycle, has general application wherever time 
series are involved. The importance to be attached to each 
of the steps, however, differs from problem to problem. The 
methods should not be applied blindly, nor should they always 
be considered superior to others, which from time to time 
have been and are being developed to suit special conditions. 
The end to be accomplished by analysis is always important, 
and the methods’ should be selected which will best help to 
realize it. 
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CHAPTER XV 


THE PRINCIPLES OF INDEX NUMBER MAKING AND 

USING 

I. Inteoditction 

Business men and students of economics and of social 
affairs use index numbers to measure changes in prices, wages, 
sales, production, stocks, and a multitude of other phenomena 
over a period of time. Rarely, however, are the sources of the 
data upon which they rest, the methods by which they are 
computed, and their suitability to special uses given considera- 
tion. 

The fact that index numbers are supposed to measure 
changes in such elusive things as prices of commodities and 
services, for instance, differing at different times, in different 
markets, and under varying conditions of sale and methods of 
calculation ought to be sufficient warning against their hasty 
use. But, unfortunately, this is not the case. Those which are 
designed for some special purpose are given general applica- 
tion, while those which are intended to measure general 
changes are applied to specific uses with little or no thought 
of the consequences. Their use and preparation are too often 
divorced. This comes about because index numbers of a 
variety of types — ^not easily distinguished as to purpose, 
method of calculation, etc., by the layman — are easily ob- 
tained, and because those who have occasion to dse index num- 
bers rarely have the time and training to prepare them. In- 
struction in both index number making and using is needed. 
It is the purpose of this and the following chapter to furnish 
a basis for such instruction. 
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II. Index Nttmbees Defined and the Methods of 
Computing Them Illustrated 

Index numbers are a series of numbers by which changes in 
the magnitude of a phenomenon are measured from time to 
time or from place to place. For example, the number, 176, 
which shows the relation of the average wholesale price of 
a group of commodities in 1924 to their price in 1913 is an 
index number. The series of numbers expressing similar 
relations for prices in each year from 1913 to 1924 are known 
as index numbers. Moreover, the same expression is applied 
to numbers which show changes in prices between two or more 
places. Their purpose, therefore, is to reduce to a common de- 
nominator the qualities of different phenomena — as prices, 
stocks, production, etc, — so as to allow time and place com- 
parisons to be made. 

But 

"... it must be borne in mmd that no index number corresponds 
to a real thing. It is not like the mean of certam observations m 
natural science — such, for example, as those for measuring the dis- 
tance between the earth and the sun — of which any one may err, but 
whose average will point to a single specific fact An index number 
points to no single fact It gives, to repeat, only an indication of a I 
general trend of prices. People often think and speak loosely on this « 
topic, as if an index number told the whole story once for all. There 
is no one change in prices. There is a medley of many changes, dif- 
ferent in direction and degree. All that we can hope to secure by 
averaging and summarizing is some concise statement of the general 
drift."" 

The nature of an index number and the methods by which it 
may be computed may be illustrated by means of an example. 

An index number is wanted which will show the movement 
of wholesale prices of paper in Chicago from 1913 to 1921. 
Price data are*" available from books of jobbers on the follow- 
ing types of paper: ^^newsprint,'^ “wrapping,’^ “book," “fine," 
“paper-board," and “miscellaneous." How can these different 

‘Taussig, F. W., Principles of Eoonomics (Revised Edition, 1915), 
Macmillan, Ket\' Tort, Vol. I, p. 294. 
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phenomena — ^‘prices^^ of different grades of ^^paper^^ — ^be re- 
duced to a common denominator so as to allow a time com- 
parison to be made? 

The prices come from different jobbers, apply to different 
kinds of paper and are yearly averages. Accordingly, both 
prices and paper must be made comparable. The prices may 
be averagedj and quoted for uniform quantities — 100 lbs. The 
types of paper to which they apply cannot be averaged, but 
can be compared for the different jobbers so as to secure uni- 
form grades. Reserving for later discussion the principles 
which such a problem presents, various index numbers of 
prices may be constructed. 

The average yearly prices and the types of paper used in 
the illustration are shown in Table 78. 

TABLE 78 

Average Wholesale Prices of Different Types op Paper in Chicago, 1913-1921 


Line 

Types op 
Paper 

Number 

OP Grades 


Average Prices in Units of 100 

LBS 


1913 

1914 

1915 

1916 

1917 

1918 

1919 

1920 

1921 

1 

Newsprint * 

1 

$3 25 

^3.25 

$3 25 

f>5 07 

$6 56 

$5 60 

$6 31 

$11 94 

$8.19 

2 

Wrapping f 

2 

4 53 i 

4 27 

4 24 

7 52 

9 90 

9 92 

9 56 

14 56 

10.53 

3 

Book $ 

6 

6 60 

6.61 

6 70 

9 75 

11.28 

12 08 

13.16 

19 54 

14 50 

4 

Fine § 

11 

10.81 

10 90 

11 29 

15 38 

17 98 

19 93 

22 85 

29.51 

24.49 

5 

Paper-board |I 

4 

4 75 

4.75 

4 73 

6 42 

7 73 

8.72 

9.58 

12.56 

9.72 

6 

Miscellaneous H 

3 

9.12 

9.19 

9 49 

13 99 

16.97 

18.66 

'20 85 

27 26 

23.30 

7 

Average 

— 

6 51 

6 49 

6 62 

8 02 

11.74 

12 49 

13 72 

19 23 

15.12 


* Standard Newsprint, 
t Kraft, Manila 

t Sized and super-calendered. Machine finished, Eggshell, Coated, Coated (high 
grade) , Cover. 

§ Ledger (cheap), Ledger (medium). Ledger (good), Bond (cheap). Bond 
(medium), Bond (good), Writing (manila), Wnting (medium), Writing (good). 
Writing (French), Onion skin 
11 Bristol, Straw, Jute, Pulp 
f Document manila. Blotting (white). Envelopes 

1. THE AVERAGE OF RELATIVES (rATIOS) METHOD 

(1) Average of Relatives (Ratios) 

a. Fixed Base 

If the 1913 average price of each type of paper is taken as 
100, and the price in each of the other years is expressed as a 
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percentage of this amount, and multiplied by 100, the rela- 
tives shown in Table 79, lines 1 to 6, are secured. 

The process of computing the relatives or percentages may 
be illustrated as follows: The average price of newsprint in 
1916 was $5.07 per 100 lbs. The average price of the corre- 
sponding type of paper in the base year, 1913, was $3.25. 
Accordingly, the relative price of this paper in 1916 was 
$5.07 

X 100 = 156. This number is a per cent, or as is in- 
dicated above, a relative. Similarly, the average price of ^'mis- 
cellaneous^^ paper in 1920 was $27.26. The 1913 average 
price was $9.12. Therefore, the relative price in 1920 was 

X 100 = 299. All of the relatives in Table 79 are 
computed in this manner. 

TABLE 79 

Relative Wholesale Prices of Paper in Chicago 

1913 TO 1921 

(1913 = 100) 



Types of 


Percentages or Relatives — 1913 = 

100 



Paper 

1913 

1914 

1915 

1916 

1917 

1918 

1919 

1920 

1921 

1 

Newsprint 


m 

100 

156 

202 

172 

194 

367 

252 

2 

Wrapping 


94 

94 

166 

219 

219 

211 

321 

232 

3 



ffiiil 

102 

148 

171 

183 

199 

296 

220 

4 

Fine 


101 

104 

142 

n 

184 

211 

273 

227 

5 



Iroil 


135 

163 

184 

202 

264 

205 

6 

Miscellaneous . . . 


101 


153 

186 

205 

229 

299 

255 

7 

Total of Rela- 
tives 


596 

.. 




1147 

1246 

1820 

1391 

8 

Average of 
Relatives . . . 


99 

M 

1 

185 

191 

208 

303 

232 

9 

Median 




151 

179 

184 

207 

298 

230 

10 

Geometric Mean. 


99 

B 


183 

191 


303 

231 
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Lines 8, 9, and 10, respectively, of Table 79 show arithmetic 
means, medians, and geometric means computed from these 
relatives. 

The arithmetic mean in each year is the result of dividing 
the sum of the relatives by six. The medians are secured by 
arranging the relatives each year in order of magnitude and 
taking the middle item. In all but three years — 1913, 1914, 
and 1918 — interpolation was necessary in order to find a pre» 
cise median.^ 

The geometric mean of relatives each year is gotten by 
multiplying together the relatives and taking the 6th root. 
This is done by logarithms as follows: (1) find the log of 
each of the relatives, (2) add the logs together, (3) divide 
the sum by 6, and (4) look up the natural number correspond- 
ing to the product in (3). The natural number is the index 
for the year in question. 

b. Chain Base 

In Table 79 the relative or percentage numbers are based 
on 1913. In Table 80, however, they are based on the preced- 
ing year. That is, the years are linked together. Line 7 gives 
the averages of the link-relatives, and line 8, the chain-rela- 
tives based on 1913. 

The chain-relatives are secured from the average link-rela- 
tives as follows: The average link-relative for 1913 — 100 — 
is multiplied by the link-relative for 1914 on 1913 — ^99. This 
gives the chain-relative, 99, for 1914 on 1913. The chain- 
relative for 1915 on 1913 — 100 — ^is secured by multiplying the 
link-relative for 1914 on 1913 — ^99 — ^by the link-relative for 
1915 on 1914 — 101. The chain-relative for 1916 on 1913 — 
150—is secured by multiplying the link-relatives — 99 X 101 
X 150. The remaining chain-relatives are secured in a similar 
manner. 

*See Chapter IX, pp. 286-289, for a discussion of interpolation for 
medians. 
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The amounts in line 8 are chain index numbers based upon 
1913. Those in line 7 are relative or percentage numbers 
showing average year-to-year changes. 

TABLE 80 

Table Showing OHAiN-KHaLATiv® Index Numbers of Wholesale 
Prices of Paper iijt Chicago, 1913 to 1921 
(1913 = 100) 


Line 

Types of 

Paper 

Percentages or Relatives Based on Preceding 
Year 


1913 

1914 

1915 

1916 

1917 

1918 

1919 

1920 

1921 

1 

Newsprint 

100 

100 

100 

156 

129 

85 

113 

189 

69 

2 

Wrapping 

100 

94 

99 

177 

132 

100 

96 

152 

72 

3 

Book 

100 

100 

101 

146 

116 

107 

109 

149 

74 

4 

Pine 

100 

101 

104 

136 

117 

111 

115 

129 

83 

5 

Paper-board 

100 

100 

100 

136 

120 

113 

110 

131 

77 

6 

Miscellaneous 

100 

101 

103 

147 

121 

110 

112 

131 

85 

7 

Average Link- 
Relatives 

100 

99 

101 

150 

123 

104 

109 

147 

77 

8 

Chain-Relatives 

1913 = 100 

100 

99 

100 

150 

i 

185 

i 

192 

209 

! 

307 

23G 


{2) Weighted Average of Relatives (Ratios) 

In Table 79, the relative price of each type of paper is 
counted once in order to secure the index based on averages 
of relatives — a so-called unweighted figure. That is, the sum 
of the relatives in each year is divided by six. If weights, 
proportional to the value of each type of paper consumed 
in the United States, are assigned to the relatives, the weighted 
average of relatives index is as given in Table 81 — line 8.^ 

* Neither the quantity nor the value of these types of paper consumed 
in Chicago is available. Quantity weights for the United States in 1917 
are found in Mitchell, W. C., History of Prices During the War, Bulletin 
No. 31, Averill, W. A., ‘‘Prices of Paper,” War Industries Board, Wash- 
ington, D. C., 1919. They are given on a proportional basis in Table 85. 

For a weighted average of relatives index number, however, value 
weights are desired- They may be secured from the quantity weights in 
Table 85 as follows: (1) compute a weighted average price of all grades 
of paper by multiplying the average value, type by type in Table 78, by 
the corresponding weights as shown in Table 85 — the average value is 
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TABLE 81 

Table Giving Weighted Average op Relatives Index Numbers op Wholesale 
Prices op Paper in Chicago, 1913 to 1921. Base Weights. Value of 
Paper Consumed in 1917 

(1913 = 100) 



Type op 
Paper 

£ 

” o» 

• .3 »-l 

w S ^ 

CS O o 

a p 

o e- M 

Products 

OP Weights and Relatives (See Table 79) . 
TO Nearest Whole Number 



5 

!>00 

1913 

1914 

1915 

1916 

1917 

1918 

1919 

1920 

1921 

1 

2 

3 

i 

5 

6 

Newsprint 

Wrapping 

Book 

Pine 

Paper-board 

Miscellaneous 

20 4 
12 4 
18 4 
14 3 
27 3 

7 2 

2,040 

1,240 

1,840 

1,430 

2,730 

720 

2,040 

1,166 

1,840 

1,444 

2,730 

727 

2,040 

1,166 

1,877 

1,487 

2,730 

749 

3,182 

2,058 

2,723 

2,031 

3,686 

1,102 

4,121 

2,716 

3,146 

2,374 

4,450 

1,339 

3,509 

2,716 

3,367 

2,631 

5,023 

1,476 

3,958 

2,616 

3,662 

3,017 

5,515 

1,649 

7,487 

3,980 

5,446 

3,904 

7,207 

2,153 

5,141 

2,877 

4,048 

3,246 

5,597 

1,836 

7 

Total 

100.0 

10,000 

9,947 

10,048 

14,782 

18,146 

18,722 

20,417 

30,177 

22,745 

8 

Weighted Average * 

100 

99 

lOO 

148 

181 

187 

204 

302 

227 


* Products in Line 7 divided by sum of the weights, lOO 


In order to secure yearly index numbers, the relative for 
each type of paper each year is multiplied by the value weight 
in 1917, the products totaled, and divided by the sum of the 
weights, 100. For example, the relative for newsprint in 1916 
based on 1913 is 156. The value weight for this type of paper 
in 1917 is 20.4. Accordingly, the product of the relative and 
the weight, 156 X 20.4, is 3182. The corresponding product 
for wrapping paper is 2058; for book paper, 2723. The prod- 
ucts for the other types in this year are given in the column 

Note 1 continued 

$5.08 per 100 lbs.; (2) express as a proportion of this quantity the 
average value of each type secured by multiplying the average price by a 
percentage representing its portion of the total quantity. For example : 
the average price of newsprint in 1913 was $3.25. Newsprint was 32 
per cent of the total consumed. Therefore, 32 per cent of $3.25 = $1.04, 
which is 20.5 per cent of $5.08, the weighted average value. The weights 
for the other types are computed in the same manner. 

If the prices of the different types of paper were expressed in different 
units — as, for instance, in 100 lbs., in rolls, in tons, etc. — it would be 
necessary to use weights measured in corresponding units. In this case, 
however, since the units are the same, the weights may be put on a pro- 
portional basis. 
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for 1916. The sum of the weights is 100; therefore, the 
weighted average of relatives index number for 1916 is 


14,782 

100 


= 148. 


The series of amounts in line 8 are weighted averages of 
relatives index numbers based on 1913. 

An index number based upon weighted medians of relatives 
is shown in Table 82, the weights for the different types of 
paper being the estimated proportions of the value consumed. 
In order to calculate weighted medians, the relatives must be 
arranged in order of magnitude, and the corresponding weights 
accompany them. The weights are the frequencies which must 
be divided into two equal parts in order to calculate the 
medians.^ 


TABLE 82 


Table Showing Weighted Medians of Relatives Index Numbers 
OP Wholesale Paper Prices, Chicago, 1913 to 1921 
(1913 = 100) 


1 

Year 

Index Number 

1913 

100 

1914 

100 

1915 

100 

1916 

148 

1917 

171 

1918 

184 

1919 

202 

1920 

296 

1921 

227 


Table 83 illustrates the manner in which the relatives and 
the weights (frequencies) must be arranged in order to find 
the median. The arrangement refers to 1916. It should be 
observed that the order may and probably will be different 
each year. 

^ See formulae for medians, supra^ p. 283. 
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TABLE 83 

Table Showing the Method of Computing a Weighted Median 
OF Relatives Index Number of Wholesale Paper Prices in 
Chicago, 1916 


Types op Paper 

Relatives 

1916 

Base = 1913 

Value Weights 
1917 

Per Cent 

Paper-board 

136 

27.3 

Fine 

142 

14.3 

Book 

148 

18.4 

Miscellaneous 

153 

7.2 

Newsprint 

156 

20.4 

WraDuinff 

166 

12.4 


Total 


100 



Weighted Median of Relatives. 

148 

— 


2. RATIOS OF AVERAGES 

An alternative method to averaging the relatives (ratios) 
unweighted (see Table 79) or weighted (see Table 81) is to 
express the average price each year in the form of a ratio 
relative to the price in a base year. 

In Table 78, the prices for the different types of paper each 
year are given in units of 100 lbs. Line 7 of this table shows 
the simple average price in each of the years. If the different 
averages in this line are expressed as ratios with 1913 as a 
base, the index numbers are as given in Table 84. 

That is, the average price in 1913, $6.51, is taken as 100, the 
average prices in the other years being expressed as percent- 
ages of this amount and multiplied by lOO. For instance, 

the index number for 1917 is^^ X 100 = 180. The index 

numbers for the other years are computed in a similar manner. 
Either the average price or the sum of the average prices 
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may be expressed in this manner. Since the totals are divided 
by the same amount — six in this case — in order to get the 
averages, the relations between the ratios for the different 
years are identical in the two methods. 

TABLE 84 

Table Showing Ratios-op-Averages Index Numbers op Whole- 
sale Paper Prices in Chicago, 1913-1921 
(1913 = 100) 


Years 

Average 

Price 

In-dex Number 
1913 = 100 

1913 

$ 6.61 

100 

1914 

6.49 

100 

1915 

6.62 

102 

1916 

8.02 

123 

1917 

11.74 

180 

1918 

12.49 

192 

1919 

13.72 

211 

1920 

19.23 

295 

1921 

15.12 

232 


3. ratios of weighted aggregates 

Instead of using (1) different unweighted averages of rela- 
tives (as in Table 79), (2) different weighted averages of 
relatives (as in Tables 81 and 82), or (3) ratios’ of averages 
(as in Table 84) , the actual prices may be weighted by suit- 
able quantities, totaled or aggregated, and expressed as ratios 
relative to a given base. Index numbers computed in this 
manner are given in Table 85, 1913 being used as the base. 

The method of computing this type of an index is different 
from that used in Table 81. In Table 85, the actual prices are 
weighted by quantities; in Table 81, the relative prices are 
weighted by values. It will be noticed, however, that the re- 
sults are the same.^ The reason for this agreement is well 

slight difference occurs for the year 1914, but this is due to the 
treatment of decimal amounts. 
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TABLE 85 

Table Giving Weighted Aggregate of Actual Prices Index Numbers of Wholesale 
Prices of Paper in Chicago, 1913 to 1921 

Base Weights as Proportions Consumed in 1917 
(1913 = 100) 


Line 


Types of 
Paper 


6-1 Eh iH 

oE-t 

i g 

O! H 

^ £ r 


Products op Price (See Table 78) and Weights 
(See Column 3) — Per Cents op Total Consumption 
IN THE United States 




Qua 

Per 

Con 

1913 

1914 

1915 

1916 

1917 

1918 

1919 


1921 

1 

Newsprint 

32 0 



104 0 

162.2 

209 9 

179.2 


382 1 

262.1 

2 

Wrapping 

13 9 


59.4 

58 9 

mMm 

137.6 

137 9 

132 9 

202.4 

146 4 

3 

Book 

14.2 

93 7 

93 9 

95 1 

138 5 

160 2 

171.5 

186.9 

277.5 

205 9 

4 

Fine 

6 7 

72 4 


75.6 

103 0 

120 5 

133 5 

153 1 

197 7 

164 1 

5 

Paper-board 

29.2 

138 7 

138 7 

138 1 

187.5 

225 7 

254 6 

279 7 

366 6 

283 8 

6 

Miscellaneous 

4 0 

36 5 

36 8 


56 0 

67:9 

74.6 

83 4 

109 0 

93 2 

7 

Total 


505 8 


751.7 

921.8 

951 3 


1535 2 

1155 6 

8 i 

Relatives * 











1913 = 100 

100 

100 

100 

148 

181 

187 

204 

302 

227 


* To the nearest whole number. 


expressed by Mitchell. He says: 

if we want an aggregate of actual prices, we merely multiply 
the quotations of each commodity at each date by the physical quan- 
tities used as weights, and add these products. To measure the varia- 
tions of these aggregates in terms of prices at the base period, we 
have only to divide the aggregate for each period by the aggregate 
for the base period. But if we plan to make a weighted arithmetic 
mean of price variations, we begin by turning the quotations into 
relative prices. That is, we divide the actual price of each com- 
modity at each date by its price in the base period. Then we weight 
these relatives, not by physical quantities as m the first case, but by 
the money values of the physical quantities at the prices of the base 
year But in this step the prices of the base year, which were just used 
as divisors to get relative prices, are used agam as factors by which 
the relative prices are multiplied. Hence our results are the same as 
if we had neither multiplied nor divided by the prices of the base 
year, in other words, the same as if we had multiplied the quotations 
of each commodity m each year by the physical quantities used as 
weights. But that is just what we did when we set out to make an 
aggregate of actual prices. So far, then, the two processes are iden- 
tical in their outcome. And the remammg steps are also the same. 
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The products must be added, and the sums divided by the physical 
quantities used as weights times the actual prices of the base year. 
Therefore, to make relative prices from aggregates of actual prices 
is a shorter way of getting the same results as are obtained by mak- 
ing similarly weighted arithmetic means of relative prices.'' ^ 

4 . SUMMARY OF RESULTS BY DIFFERENT METHODS 

Different methods of computing index numbers for the 
wholesale prices of six types of paper in Chicago give varying 
results. These are compared in Table 86. 

TABLE 86 

Index Numbers of Wholesale Prices of Paper in Chicago 1913 - 
1921 Computed by Different Methods 
(1913 = 100 ) 


Year 


Averages op 

Relatives i 

(Ratios) 


Ratio 

OP 

Averages 

Weighted 
Aggregate 
OP Actual 
Prices 

Unweighted 

Weighted 

Arithmetic 

Mean 

Median 

! 

Geometnc 

Mean 

Arithmetic ; 
Mean * 

Median 

Fixed 

Base 

Chain 

Base 

1913 

100 

100 

100 

100 

100 

100 

100 

100 

1914 

99 

99 

100 

99 

99 

100 

100 

100 

1915 

101 

100 

101 

101 

100 

100 

102 

100 

1916 

150 

150 

151 

150 

148 

148 

123 

148 

1917 

185 

185 

179 

183 

181 

171 

180 

181 

1918 

191 

192 

184 

191 

187 

184 

192 

187 

1919 

208 

209 

207 

207 

204 

202 

211 

204 

1920 

303 

307 

298 

303 

302 

296 

295 

302 

1921 

^ 232 

236 

230 

231 

227 

227 

232 

227 


* See the comment, p. 478, relative to the results by these methods. 


In some cases the differences are large, in others, negli- 
gible. For two methods the numbers are identical throughout. 
Moreover, for certain years all methods give the same results. 
^Mitchell, op. ait., 80-81. 
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This single body of price data has served to illustrate the arith- 
metic of the more important methods of computing index 
numbers. The remaining discussion of the chaptef is con- 
cerned with the principles back of the methods. It will help 
to explain the reasons for the differences and similarities. 

HI. The Uses op Index Numbers 

In what has gone before, plan and purpose in statistical 
study have been emphasized. Both need to be especially 
stressed in connection with index numbers, because, while 
most of those that are currently used are of the “general 
purpose’^ type, they are given a variety of special uses. 

“Few of the widely used index numbers, . are made to serve 
one special purpose. On the contrary, most of them are ^general- 
purpose' series, designed with no aim more definite than that of 
measuring changes in the price level Once published they are used 
for many ends — to show the depreciation of gold, the rise in the cost 
of living, the alternations of business prosperity and depression, and 
the allowance to be made for changed prices in comparing estimates 
of national wealth or private income at different times. They are 
cited to prove that wages ought to be advanced or kept stable; that 
railway rates ought to be raised or lowered; that 'trusts’ have 
manipulated the prices of their products to the benefit or the injury 
of the public; that tariff changes have helped or harmed producers 
or consumers; that immigration ought to be encouraged or restricted; 
that the monetary system ought to be reformed; that natural re- 
sources are being depleted or that the national dividend is growing 
They are called in to explain why bonds have fallen m price and 
why mterest rates have risen, why public expenditures have in- 
creased, why social unrest prevails m certain years, why farmers 
are prosperous or the reverse, why unemployment fluctuates, why 
gold is being imported or exported, and why political 'landslides’ 
come when they do.”"^ 

^ Generally speaking, however, two major purposes, so far as 
price indexes are concerned, are distinguishable: (1) to meas- 

2 Mitchell, Wesley C,, “Index Numbers of Wholesale Prices in the 
United States and Foreign Conn tries , BulUUn of the United States 
Bureau of Labor Statistics^ Whole Number 173, July, 1915, pp. 25-26. 
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ure general changes in prices, and (2) to interpret the effect 
of the changes upon various classes of people. 

An index number serving the first use is computed from the 
prices of a wide selection of commodities covering all phases 
of industry; one designed for the second purpose, from the 
commodities the changes in prices of which have special refer- 
ence to the class concerned.^ For instance, the United States 
Bureau of Labor Statistics publishes index numbers of whole- 
sale prices based upon 404 commodities, the selection being 
made with the intent of sampling the general markets On the 
other hand, the same Bureau publishes index numbers of retail 
prices of foods, the commodities being selected from indus- 
trial centers and referring to articles currently purchased by 
so-called workingmen’s families.^ Their purpose is to serve as 
a basis for approximating the effect of price changes’ upon 
consumers. A variety of special purpose types of index num- 
bers are now issued, the more important of which are de- 
scribed in Chapter XVI. ^ / 

But index numbers are not restricted to price phenctoena. 
Any phenomenon extending over a period of time and ex- 
pressed numerically may be put in this form, the only peculi- 
arity being that its relative rather than its absolute aspect is 
exhibited. Index numbers of wages, rents, imports, exports, 
sales, production, or of any other phenomenon may be con- 
structed. Some of the more important of these non-price series 
are described in Chapter XVL 

IV. Principles op Index Number Making 

Because the uses which are made of index numbers of prices 
and of other phenomena vary widely, and because different 
methods are available according to which they may be con- 
structed, the question of the n mnose which they are to^K erve 
is of first importa nce. 

^See the discussion of this index number, Chapter XVI, pp. 516-518. 

®See the discussion of this index number, Chapter XVI, pp. 520-521. 
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Generally speaking, the purpose of an index number is, as 
Fisher says, ^That it shall fairly represent, so far as one single 
figure can, the general trend of the many diverging ratios from 
which it is calculated. It should be a 'just compromise^ among 
conflicting elements, the 'fair average,^ the 'golden mean/ 
Without some kind of fair splitting of the differences involved, 
an index number is apt to be unsatisfactory, if not absurd/^ ^ 
The difficulty of securing such a "fair average^^ can be appre- 
ciated only by a detailed study of the index numbers cur- 
rently issued, and of the principles involved in index number 
making.^ 

1. THE ATTRIBUTES OP INDEX NUMBERS AND THE STEPS IN 
THEIR CONSTRUCTION 

Fisher enumerates as follows the attributes of an index 
number: 


(1) "As to the Construction of the Index Number 

a. "T/ie general character of the data included, e.g. 'wholesale 
prices’ or 'retail prices’ of commodities, or 'prices of stocks,’ or 
'wages,’ or 'volume of production,’ etc. 

b. " T/ie specific character of data included, e.g. 'foods,’ still further 
specified as 'butter,’ 'beef,’ etc, 

c. Their assortment, e.g. a larger proportion of quotations of 
meats than of vegetables. 

d. "TAe number of quotations used, e.g. '22 commodities’ as in 
the case of the Economist index number (until recently) as con- 
trasted with '1474 commodities’ as in the case of the War Industries 
Board. 

^ Fisher, Irving, The Mahing of Index Numbers, Houghton Mifl9in, Bos- 
ton, 1922, p. 10. 

*Such a comparative study has been made by Professor Wesley C. 
Mitchell in “Index Numbers of Wholesale Prices m the United States 
and Foreign Countries/* Bulletin of the United States Bureau of Lalor 
Statistics, Whole Number 284, October, 1921. Acknowledgments are here 
made of the indebtedness of the writer to Professor Mitchell for much of 
the illustrative matter in this and the following chapters. An elaborate 
analysis of a somewhat different kind has also been made by Professor 
Fisher in his monumental study. The Mahing of Index Numbers, referred 
to immediately above. 
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e. ^‘The kind of mathematical formula employed for calculating 
the index number, e.p. the 'simple arithmetic average' or the 'weighted 
geometric average,' etc. 

{2) “As to the Particular Times or Places to Which the Index 
Number Applies 

a. “The period coveredy e.g, '1913-1918,' or the territory covered, 
eg, certain specified cities of which the price levels are to be com- 
pared. 

b. “The base, e.g. the year 1913. 

c. “The interval between successive indexes, eg. 'yearly' or 
'monthly.' 

(5) “As to the Sources and Authorities 

a. “The agency which collects, calculates, and publishes the index 
number, e.g. 'Bradstreet's' or the 'United States Bureau of Labor 
Statistics.' 

b. “The markets used, e.g. the 'Stock' or 'Produce' Exchanges of 
'New York' or the 'primary markets of the United States.' 

c. “The sources of quotations, e g. the 'leading trade journals' or 
the books of business houses. 

d. “The publications containing the index number, e.g. the Bulletin 
of the United States Bureau of Labor Statistics." ^ 

Mitchell approaches the problem somewhat differently. His 
enumeration, -bf the processes in making an index number is as 
follows: 

"(1) Defining the purpose for which the final results are to be 
used; (2) deciding the numbers and kinds of commodities to be 
included; (3) determining whether these commodities shall all be 
treated alike or whether they shall be 'weighted' according to their 
relative importance; (4) collecting the actual prices of the com- 
modities chosen, and, in case a weighted series is to be made, col- 
lecting also data r^arding their relative importance; (5) deciding 
whether the form of the index number shall be one showing the 
average variations of prices or the variations of a sum of actual 
prices; (6) in case average variations are to be shown, choosing 
the base upon which relative prices shall be computed; and (7) 
settling upon the form of average to be struck, if averages are to 
be used. 

* Fisher, Irving, The Making of Index Numbers, Houghton MifBin, 
Boston, 1922, pp. 8-9. 
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^At each one of these successive steps choice must be made among 
aiternatives that range in number from two to thousands. The pos- 
sible combinations among the alternatives chosen are indefinitely 
numerous. Hence there is no assignable limit to the possible Varie- 
ties of index numbers, and m practice no two of the known series 
are exactly alike in construction. To canvass even the important 
variations of method actually in use is not a simple task ” ^ 


2. DATA FEOM WHICH PRICE INDEX NUMBERS ARE MADE 

In a study of prices attention must first be centered upon 
the commodities included and the conditions of price making. 
Distinction will have to be made between producers’ and con- 
sumers’ goods, ^ between raw and manufactured commodities’,® 
between manufactured goods bought by consumers for family 

a Mitchell, Wesley C., ‘‘Index Numbers of Wholesale Prices in the 
United States and Foreign Countries,” Bulletin of the United States 
bureau of Labor Statistics, Whole Number 284, October, 1921, p. 23. 

* “ . . . there are characteristic differences between the price fluctua- 
tions of manufactured commodities bought by consumers for family use 
and the price fluctuations of manufactured commodities bought by busi- 
ness men for industrial or commercial use. . . . Though consisting more 
largely of the erratically fluctuating farm products, the consumers’ goods 
are steadier in price than the producers’ goods, because the demand for 
them is less influenced by changes in business conditions.” Op. cit., pp. 46-48. 

® ‘‘These several comparisons establish the conclusion that manufac- 
tured goods are steadier in price than raw materials. The manufactured 
goods fell less in 1890-1896, rose less in 1896-1907, again fell less in 1907- 
1908, and rose less in 1908-1913. Further, the manufactured goods had 
the narrower extreme range of fluctuations, the smaller average change 
from year to year, and the slighter advance m price from one decade to 
the next. It follows that index numbers made from the prices of raw 
materials, or of raw materials and slightly manufactured products, must 
be expected to show wider oscillations than index numbers including a 
liberal representation of finished commodities.” Op. cit., p. 41. 

“First, the list of commodities used by the Bureau of Labor Statistics 
includes 29 quotations for iron and its products, 30 quotations for cotton 
and its products, and 18 for wool and its products, besides 8 more quota- 
tions for fabrics made of wool and cotton together. On the other hand 
it has but 7 series for wheat and its products, 8 for coal and its products, 
3 for copper and its products, etc. The iron, cotton, and wool groups 
together make up 85 series out of 242, or 35 per cent of the whole num- 
ber. . . , 

“Does this large representation of three staples distort these index 
numbers — particularly the bureau’s series where the disproportion is 
greatest? Perhaps, but if so the distortion does not arise chiefly from 
the undue influence assigned to the price fluctuations of raw cotton, raw 
wool, and pig iron. For, contrary to the prevailing impression, the simi- 
larity between the price fluctuations of finished products and their raw 
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use and manufactured commodities bought by business men 
for industrial uses,^ between mineral products, animal products 
and farm crops, ^ etc., the prices of all of which respond dif- 

Note 8, continued 

materials is less tlian the similarity between the price fluctuations of 
finished products made from different materials. ... As* babies from 
different families are more like one another than they are like their 
respective parents, so here the relative prices of cotton textiles, woolen 
textiles, steel tools, bread, and shoes differ far less among themselves than 
they differ severally from the relative prices of raw cotton, raw wool, 
pig iron, wheat, and hides. Hence the inclusion of a large number of 
articles made from iron, cotton, and wool affects an index number mainly 
by increasing the representation allotted to manufactured goods. What 
materials those manufactured goods are made from makes less difference 
in the index number than the fact that they are manufactured. To 
replace iron, cotton, and woolen products by copper, linen, and rubber 
products would change the results somewhat, but a much greater change 
would come from replacing the manufactured forms of iron, cotton, and 
wool by new varieties of their raw forms.” Op. ctt, pp. 48-50. 

^‘Tt has been found that among manufactured commodities those 
bought for family consumption are steadier in price than those bought 
for business use.” Op. oit., p. 51. 

* “Third, there are characteristic differences among the price fluctua- 
tions of the groups consisting of mineral products, forest products, 
animal products, and farm crops. . ^ . Fifty-seven commodities are 
included, all of them raw materials or slightly manufactured products. 
Here the striking feature is the capiicious behavior of the prices of 
farm crops under the influence of good and bad harvests. The sudden 
upward jump in their prices in 1891, despite the depressed condition of 
business, their advance in the dull year 1904, their fall in the year of 
revival 1905, their failure to advan(*e in the midst of the prosperity of 
1906, their trifling decline during the great depression of 1908, and their 
sharp rise in the face of reaction in 1911 are all opposed to the general 
trend of other prices. The prices of animal products are distinctly less 
affected by weather than the prices of vegetable crops, but even they 
behave queerly at times, for example in 1893. Forest-product prices are 
notable chiefly for maintaining a much higher level of fluctuation in 
1902-1913 than any of the other groups, a level on which their fluctua- 
tions, when computed as percentages of the much lower prices of 1890- 
1899, appear extremely violent. Finally, the prices of minerals accord 
better with alternations of prosperity, crisis, and depression than any 
of the other groups. And the anomalies that do appear — ^the slight rise 
in three years (1896, 1903, and 1913) when the tide of business was 
receding — would be removed if the figures were compiled by months. 
For the trend of mineral prices was downward in these years, but the 
fall was not so rapid as the rise had been in the preceding years, so 
that the annual averages were left somewhat higher than before. An 
index number composed largely of quotations for annual crops, then, 
would be expected at irregular intervals to contradict capriciously the 
evidence of index numbers in which most of the articles were mineral, 
forest, or even animal products.” Op, eit,, pp 44-46, 
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ferently to conditions of scarcity and surplus.^ Obviously, a 
price index number which reflects price changes at large must 
be made from -samples of all commodity groups that are 
affected in a peculiar manner. Similarly, in using an index 
number prepared by another, one must satisfy himself re- 
specting the list of commodities used before he can be sure 
what in reality the index measures. 

But what is meant by “price^^? Has one in mind retail 
or wholesale price? price at what place? under what condition 
of sale? to whom? price of what grade of commodity? on what 
market? Are the '^prices” contract, import, or market prices? 
What is the wholesale or retail price of a commodity? 

''We commonly speak of the wholesale price of articles like pig 
iron, cotton, or beef as if there were only one unambiguous price 
for any one thing on a given day, however this pnce may vary from 
one day to another. In fact there are many different prices for every 
great staple on every day it is dealt m, and most of these differences 
are of the sort that tend to maintam themselves even when markets 
are highly organized and competition is keen. Of course varying 
grades command varying prices, and so as a rule do large lots and 
small lots; for the same grade in the same quantities, different 
prices are paid by the manufacturer, jobber, and local buyer; in 
different localities the prices paid by these various dealers are not 
the same; even m the same locality different dealers of the same 
class do not all pay the same pnce to every one from whom they 
buy the same grade in the same quantity on the same day. To find 
what really was the price of cotton, for example, on February 1, 
1920, would require an elaborate investigation, and would result m 
showing a multitude of different prices covering a considerable range. 

"Now the field worker collecting data for an index number must 
select from among all these different prices for each of his commodi- 
ties the one or the few series of quotations that make the most 
representative sample of the whole. He must find the most re- 
liable source of information, the most representative market, the 
most typical brands or grades, and the class of dealers who stand in 
the most influential position. He must have sufficient technical 
knowledge to be sure that his quotations are for uniform qualities, 
or to make the necessary adjustments if changes in quality have 

^This topic has been given elaborate treatment by Professor Mitchell 
in his Business Cycles (University of California, Memoirs* VoL III, 
September, 1913), pp. 93-109. 
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occurred in the markets and require recognition in the statistical 
office. He must be able to recognize anything suspicious m the 
data offered him and to get at the facts. He must know how com- 
modities are made and must seek comparable information concerning 
the prices of raw materials and their manufactured products, con- 
cerning articles that are substituted for one another, used m con- 
nection with one another, or turned out as joint products of the 
same process. He must guard against the pitfalls of cash discounts, 
premiums, rebates, deferred payments, and allowances of all sorts. 
And he must know whether his quotations for different articles are 
all on the same basis, or whether concealed factors must be allowed 
for in comparing the prices of different articles on a given date.” ^ 

If it is difficult to establish the price of a commodity at one 
time it is even more difficult to guarantee that the price de- 
termined at one time is the price at some other time. Condi- 
tions of marketing change, commodities change as to quality 
and salability, and price lists of identical commodities for 
any great length of time are frequently not available. The 
paucity of price data and the unwillingness of people to place 
any reliance in those extant were undoubtedly the main reasons 
for the relatively late development of index numbers.^ 

Today, of course, such data as those from which the index 
numbers currently published by the United States’ Bureau of 
Labor Statistics are computed, are furnished by reputable firms 
and corporations, according to uniform instructions, on uniform 
blanks, and are carefully scrutinized by the agents of the 
Government. 

But how many commodities are necessary in order that an 
index number may indicate either the amount or effect of 
price change? From what regions should prices be drawn, 
and how frequently ought they to be recorded? Are prices 
quoted in standard and definite units?® Some commodities 

Cif., pp. 25-26. 

*Op. cit., p. 10. 

® “Often the form of quotation makes all the difference between a sub- 
stantially uniform and a highly variable commodity. For example, prices 
of cattle and hogs are more significant than prices of horses and mules, 
because the prices of cattle and hogs are quoted per pound, while the 
prices of horses and mules are quoted per bead ” Op. cit., p. 33, 
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are sensitive to conditions of demand and supply; others re- 
act slowly under changed conditions. Some are vitally affected 
by seasons, while others show appreciable change only in the 
face of violent disturbance and exhibit a steady rise or fall 
only over long periods. ^^TypicaP' price behavior can hardly 
be predicted for any commodity. It may never occur. 

What principles have been followed in the choice of com- 
modities? Are raw and manufactured commodities dispropor- 
tioned? Is a certain commodity unimportant for one pur- 
pose — or important for another — ^represented in both its raw 
and its manufactured state? How is the importance of a 
commodity given weight? What test of importance is applied? 
How is it measured? These are important questions which one 
must answer for himself for every index number before he 
uses it for a particular purpose.^ 

''Difficult as it is to secure satisfactory price quotations, it is still 
more difficult to secure satisfactory statistics concerning the relative 
importance of the various commodities quoted. What is wanted is 
an accurate census of the quantities of the important staples, at 
least, that are annually produced, exchanged, or consumed. To 
take such a census is altogether beyond the power of the private 
investigators or even of the Government bureaus now engaged in 
making index numbers. Hence the compilers are forced to confine 
themselves for the most part to extracting such mformation as they 
can from statistics already gathered by other hands and for other 
purposes than theirs. In the United States, for example, estimates 
of production, consumption, or exchange come from most miscel- 
laneous sources: The Department of Agriculture, the Census Office, 
the Treasury Department, the Bureau of Mines, the Geological Sur- 
vey, the Internal Revenue Office, the Mmt, associations of manu- 
facturers or dealers, trade papers, produce exchanges, traffic records 
of canals and railways, etc. The man who assembles and compares 
estimates made by these various organizations finds among them 
many glaring discrepancies for which it is difficult to account. Such 
conflict of evidence when two or more independent estimates of 

^Both for American and European index numbers such questions as 
these and many more are answered in Bulletin of the United States 
Bureau of Labor SiatisUes^ Whole Number 284, to which reference has 
so frequently been made. 
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the same quantity are available throws doubt also upon the seem- 
ingly plausible figures coming from a single source for other articles. 
To extract acceptable results from this mass of heterogeneous data 
requires intimate familiarity with the statistical methods by which 
they were made, endless patience, and critical judgment of a high 
order, not to speak of tactful diplomacy in dealing with the authori- 
ties whose figures are questioned.” 

Mitchell, following an elaborate comparison of the various 
American index numbers, so far as choice of commodities and 
the importance assigned them are concerned, arrives at the 
following conclusions: 

'As for the small series made from the prices of foods alone or 
from the prices of any single group of commodities, it is clear that 
however good for special uses they may be, they are untrustworthy 
as general-purpose index numbers.” * 

" Large^dex numbers are more trustworthy for general purp oses 
thansmSUones, not only m so far as they~m3u3elDaor^^ 
related prices, but also in so far as they contain more numerous 
samples from each group ) What is characteristic in the behavior 
of the prices of farm crops, of mineral products, of manufactured 
wares, of consumers^ goods, etc. — ^what is characteristic in the be- 
havior of any group of prices — ^is more likely to be brought out and 
to exercise its due effects upon the final results when the group is 
represented by 10 or 20 sets of quotations than when it is repre- 
sented by only one or two sets The basis of this contention is 
simple: In every group that has been studied there are certain 
commodities whose prices seldom behave in the typical way, and no 
commodities whose prices can be trusted always to behave typically. 
Consequently, no care to include commodities belonging to all the 
important groups can guarantee accurate results, unless care is also 
taken to get numerous representatives of each group,” 

3. DISPERSION OP PRICE FLUCTUATIONS ^ 

The trend of price change is generally in one direction for 
a considerable period. There are periods’ of falling and of 

ciif., p. 26. 
p. 53. 

* Op. pp. 58-59. 

^In this discussion a price index is used for purposes of illustration. 
The treatment follows very closely that of Wesley C. Mitchell in Bul- 
letin of the United States Bureau of Lalor Statistics, Whole Number 284. 
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rising prices. This, of course, does not mean that all prices 
change in the same direction at the same time, nor that those 
which change together vary in the same degree.^ All that 
is meant is that in terms of a single year or an average of 
years taken as a base, the price level moves up or down 
through relatively long periods. The differences of price from 
the norm, whether negative or positive, generally tend to be in 
the same direction. Large differences, of course, are less com- 
mon than small ones, but those that are positive do not exactly 
compensate for those that are negative. Mitchell has shown 
this in a striking way by comparing the price variations of 
241 commodities in 1913, computed, first, as percentages of 
rise or fall from the prices in 1912 ; and second, as percentages 
of rise or fall from the average prices of 1890-1899. Graphi- 
cally, Figure 86 ^ shows the percentage changes of rise and 
fall. 

The percentage differences — excesses and deficiencies of the 
1913 prices relative to the 1912 prices — arrange themselves, 
as shown by the solid line, about a norm, the arithmetic mean, 
the mode and the median tending closely to agree. 

^^But the distribution of the second set of variations (percentages 
of change from the average prices of 1890-1899) as represented by 
the area inclosed within the dotted line has no obvious central 
tendency; it shows no high degree of concentration around the 
arithmetic mean (+304 per cent) or median (+26 per cent) and 
it has a range between the greatest fall (52.2 per cent) and greatest 
rise (234.5 per cent) so extreme that two of the cases could not be 
represented on the chart 

'Trice variations, then, become dispersed over a wider range and 
less concentrated about their mean as the time covered by the 
variations increases. The cause is simple: With some commodities 
the trend of successive price changes continues distinctly upward 
for years at a time; with other commodities there is a consistent 

^See Fisher, Irving, op. Chapter II for a discussion and for 
various graphic illustrations of the dispersion of the prices of 36 com- 
modities, 1913 to 1918 See also Figure 65, supra, p. 335, showing 
price dispersion from 1891-1918, 
dt., p. 20. 



492 STATISTICS AND STATISTICAL METHODS 

TABLE 87 

Distribution of 5578 Cases of Change in the Wholesale Prices 
OP Commodities from One Year to the Next, according 
TO THE Magnitude and Direction of the Changes 


(Based upon the chain relatives in Table 11 of Bulletin of the Bureau 
of Labor Statistics, No. 149) 


Rising Prices 

Falling Prices 

Per Cent of 
Change from 
the Average 
Price of the 
Preceding 
Year 

Num- 
ber of 
Cases 

Pro- 

por- 

tion 

of 

Cases 

Per Cent 
of Change 
from the 
Average 
Price of 
the 

Preceding 

Year 

Num- 
ber of 
Cases 

Propor- 

tion 

of 

Cases 

Per Cent 
of Change 
from the 
Average 
Price of 
the 

Preceding 

Year 

Num- 
ber of 
Cases 

Pro- 

por- 

tion 

of 

Cases 

102-103.9 

1 

0 018 

46-47.9 

11 

0.197 

Under 2 

*405 

7 261 

100-101.9 

1 

018 

44-45.9 

10 

.179 

2- 3.9 

*375 

&.723 

98- 99.9 

— 

— 

42-43 9 

6 

.108 

4- 5.9 

329 

5 898 

96- 97.9 

— 

— 

40-419 

14 

.251 

6- 7.9 

*238 

4.267 

94- 95.9 

— 

— 

38-39 9 

17 

.305 

8- 9.9 

200 

3.585 

92- 93 9 

— 

— 

36-379 

11 

.197 

10-11.9 

173 

3101 

90- 91.9 

— 

— 

34-35 9 

18 

323 

12-13.9 

*120 

2.151 

88- 89.9 

— 

— 

32-33 9 

17 

305 

14-15.9 

107 

1918 

86- 87.9 

1 

.018 

30-31.9 

22 

.394 

16-17 9 

76 

1362 

84- 85.9 

1 

.018 

28-29.9 

30 

.538 

18-19.9 

71 

1.273 

82- 83.9 

1 

.018 

26-279 

29 

.520 

20-21.9 

45 

.807 

80- 81.9 

1 

.018 

24-25.9 

47 

.843 

22-23.9 

39 

.699 

78- 79.9 

— 

— 

22-23.9 

45 

807 

24-25.9 

32 

.574 

76- 77.9 

— 

— 

20-219 

65 

1165 

26-27.9 

17 

.305 

74- 75.9 

1 

.018 

18-19.9 

73 

1.308 

28-29.9 

27 

.484 

72- 73.9 

4 

.072 

16-17.9 

* 102 

1828 

30-31.9 

16 

.287 

70- 71.9 

1 

.018 

14-15 9 

106 

1.900 

32-33.9 

7 

,125 

68- 69 9 

3 

.054 

12-13 9 

' 115 

2 062 

34-35.9 

10 

.179 

66- 67.9 

4 

.072 

10-11.9 

167 

2 994 

36-37.9 

7 

.125 

64- 65.9 


— 

8- 9.9 

* 237 

4 249 

38-39.9 

5 

.090 

62- 63.9 


— 

6- 7.9 

261 

4 679 

40-41.9 

5 

090 

60- 619 


.072 

4- 5.9 

*356 

1 6.382 

42-43 9 

4 

.072 

58- 59.9 

6 

.108 

2- 3.9 

355 

6 364 

44-45 9 

2 

.036 

56- 57.9 


.018 

Under 2 

*410 

7.350 

46-47.9 

1 

.018 

54- 55 9 


.054 

— 

— 

— 

48-49.9 

1 

.018 

52- 53.9 


.072 

No change 

*697 

12 494 

50-51.9 

1 

,018 

50- 51.9 


.018 

— 

— 

— 

52-53.9 

— 

— 

48- 49.9 

5 

! .090 

— 

— 

— 

54-55.9 

1 

.018 
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Summary 



Number of Cases 

Proportion of Cases 

Rising prices 

2,567 

46 021 

No change 

697 

12 494 

Falling prices 

2,314 

41 485 

Total . 

1 5,578 

100.000 1 


* Location of the deciles. 
•fOp cit, p. 18. 


downward trend; with still others no definite long-period trend ap- 
pears. In any large collection of price quotations covering many years 
each of these types, m moderate and extreme form, and all sorts of 
crossings among them, are likely to occur. As the years pass by 
the commodities that have a consistent trend gradually climb far 
above or subside far below their earlier levels, while the other com- 
modities are scattered between these extremes. Thus the percentages 
of variation for any given year gradually get strung out in a long, 
thin, and irregular line, without a marked degree of concentration 
about any single point ” 

The tendency for price changes, calculated from year to 
year, to arrange themselves around a central position — ^to con- 
form to the ^^normal law of error” — has been worked out by 
Mitchell for the years 1891-1913 for 5578 cases. The price of 
each of more than 230 commodities during this period was 
expressed each year as a percentage of its price in the preced- 
ing year. The changes were then arranged in ascending order 
from the greatest decrease up through no change to the great- 
est increase. For the whole distribution deciles were then 
worked out for each year. With the changes arranged in this 
manner it is easy to measure the concentration about a norm, 
and to indicate the differences by successive deciles. Mitchelhs 
table showing the dispersion, and his comments concerning it 
are given in the footnote on page 330. 

The actual distribution of the changes for the 5578 cases 
is given in Table 87, and is compared with a '^normal curve 
of error” in Figure 87. 

citf 21-22. 
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FIGURE 87 

Distribution of 5578 Price Vaeutions 

(Percentages of Eise or Fall from Prices of Preceding Year) 



In commenting upon the form of this distribution and its 
relation to the normal error curve, Mitchell says: 

^There are several points to notice here. While the actual and 
the ^normal^ distributions look much alike, they are not, strictly 
speaking, of the same type. The actual distribution is much more 
pointed than the other, and has a much higher ^mode,^ or point of 
greatest density. On the other hand, the actual distribution drops 
away rapidly on either side of this mode, so that the curve repre- 
senting it falls below the curve representmg the 'normaF distribu- 
tion. The actual distribution is 'skewed' instead of being perfectly 
symmetrical. The outlying cases of a 'normal' distribution extend 
precisely the same distance from the central tendency m both direc- 
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tions, whereas m the actual distribution the outlying cases run about 
twice as far to the right (in the direction of a rise of prices) as 
to the left (m the direction of a fall). This fact suggests that the 
actual distribution would be more symmetrical if it were plotted 
on a logarithmic scale, one which represents the doubling of one 
price by the same distance from zero as the halvmg of another 
price. Another aspect of the difference in symmetry is that the 
central tendency about which the variations group themselves is 
free from ambiguity m one case but not in the other. In the 'normal' 
distribution this tendency may be expressed mdifferently by the 
median, the arithmetic mean, or the mode; for these three averages 
coincide. In the actual distribution, on the contrary, these averages 
differ slightly; the median and the 'crude' mode stand at ±: 0, while 
the arithmetic mean is + 1 36 per cent These departures of the 
actual distribution from perfect symmetry possess significance; but 
the fact remains that year-to-year price fluctuations are highly con- 
centrated about their central tendency."^ 

The agreement between the distributions of price variations 
measured from year to year and the normal curve of error is 
important in the interpretation and calculation of index num- 
bers. Many index numbers are of the average-of-relatives 
type. That is, relatives or ratios based upon a fixed or chang- 
ing base are averaged in order to compare price changes from 
year to year. For this purpose the arithmetic mean is cus- 
tomarily used. But it is markedly affected by extremes. 
Accordingly, if the deviations from an average are not sym- 
metrically distributed about a norm or central position, the 
arithmetic mean is a poor measure of central tendency. If, on 
the other hand, distribution is normal, or approximately so, 
as in the case of the chain-relatives shown in Figure 87, then 
the arithmetic mean agrees with or is not markedly different 
from the median and the mode, and may be used to describe, 
as accurately as any single amount can — mth this tfarm of an 
index number — ^the nature of price change. 

Mitchell, after expressing price changes (1) on a remote 
fixed base^ — 1890-1899 — and (2) on a year-to-year base, con- 
cludes as follows: 

I Op. cit, pp. 18-19. 
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'The consequence is that the measurement of price fluctuations 
becomes difficult m proportion to the length of time during which 
the variations to be measured have continued In other words, 
the farther apart are the dates for which prices are compared, the 
wider is the margin of error to which index numbers are subject, 
the greater the discrepancies likely to appear between index num- 
bers made by different investigators, the wider the divergencies 
between the averages and the individual variations from which they 
are computed, and the larger the body of data required to give 
confidence in the representative value of the results/^ ^ 

Two important questions are raised by the above discussion: 
(1) should reliance be placed in an average of relatives index 
number, and (2) if a relative is used, what average should be 
employed? These questions are discussed immediately below. 

V. The Methods op Constructing Index Numbers 

Illustrations of three major methods of constructing index 
numbers are given above by using wholesale prices of paper 
m Chicago. Each of them needs to be considered separately. 

1. AVERAGES OP RELATIVES (RATIOS) 

(1) Fixed vs. Shifting Base 

In order to compute an average of relatives, a base must be 
selected in terms of which to express the prices as percentages. 
In making a choice, two alternatives are presented, (1) a 
single year which is made common to all the series,^ and (2) 
the preceding year changing from year to year The first is 
known as a fixed base; the second as a shifting base. When 
relatives are computed in terms of a fixed year, the index is 
known as a ^^fixed base relative” ; when in terms of a shifting 
year, and the resulting ratios are multiplied together, the 
number is known as' a “chain-relative.” 

Table 79 shows such a fixed base relative — in terms of 1913 

^Op. citf p. 22. 

* In some cases, th^ leverage price daring a series of years is used. The 
base, however, is fijrm^that is, it applies to all of the years. 
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—the arithmetic mean, the median, and the geometric mean 
being the averages in which the various changes are measured. 
Table 80 shows a chain-relative calculated upon the same 
base period. 

a. Arithmetic Means of Relatives — Fixed Base 

In computing a fixed base relative, some year or number of 
years which is thought of as normal is selected By computing 
the prices as percentages of this base, differences in prices as 
well as in the units in which the prices are quoted are sup- 
posed to be reduced to a common denominator so that they 
can be totaled and averaged. But as has been shown, (the 
dispersion of relatives computed upon a fixed base, more 
particularly when it is remote, is large, and the distribution 
skewed in the direction in which most prices are moving. 
Arithmetic averages of relatives do not, under these conditions, 
reflect the typical or modal movement. They are too much 
affected by the e xtreme s. 

^ More^er, the importance or weight assigned to the amount 
of the change is inversely proportional to the magnitude of the 
price in the base year. If prices change, dividing them by 
the base price does not bring them to a comparable basis, 
unless they all change at the same rate — which they do not 
do in the case of the wholesale prices of paper, nor with the 
prices used by Mitchell. Indeed, it is safe to say that un i- 
formity of change is nev er encountere d. To add and take an 
arithmetic m^n^f relatives gives too much wSgETtolncreas- 
ing anHTuo lit tle weight to decr ea smj^ncei T*"^ ^ more 

weight is given to ra pidly risin g than to slowly risi ng prices, 
anSlnQore to rapidly falling tha n to s lowly falling prices. 

b. Medians of Relatives — Fixed Base 

But medians of fixed base relatives may be used rather than 
arithmetic means. What may be said in their favor? 
Medians are less affected by extreme items than are arithmetic 
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meaiis^ and, the^ore, are likely to be more typical of price 
changes. But (1) there may be no actual median items; ^ (2) 
medians of different groups cannot be combined nor aver- 
aged;^ (3) they are not reversible, that is, index numbers 
based upon them cannot he shifted frpm base to base by divi- 
sion;® and (4) they are erratic when there are few items.^ 
Moreover, to take medians of relatives does not remove the 
bias to w^h relatives are due in periods of rising and falling 
prices, ^^he bias here is due to the method of measuring the 
change, not to the method of averaging it. 


c. Geometric Means of Relatives — Fixed Rase 

Instead of using arithmetic means or medians of relatives, 
geometric means may be employed. If the average rati o oj 
c hange in prices is_ to be measured, the geometric mean sh"o uI5 
be used. This average gives equal influence t o equal ratio s 
oFcIiange, irrespective of th e pre vious level of the prices, pro- 
ducHon^stocksTo^ not to which the changes appIyT The 
doubling of one price, for insta nce, is exact ly counterbalanced 
by tLe Tiil^n g ^ anothei^w h^enaT^ ^^ averaje^^ o TTr Ee 

changes is taEe nT Ao&OfdmglyT^^ m eans are alway s 

smallerTIian” arithm[efic" means ofTelatives.^ They may be 
smalle F'br^eateFthan me diinalQL^rel^^ ^ 

AfTiHu^ationTw help to make the distinction clear be- 
tween measuring price changes by an arithmetic mean of rela- 
tives and by a geometric mean of relatives. 

^Six of the medians had to be interpolated for in Table 79. While 
few items are involved in this illustration, the diflBculty encountered is 
typical of medians. It does not occur onl'i/ when few items are used. 
See the discussion of the median and of interpolation, pp. 286-289. 

^This follows because, in order to locate medians, items must be 
arranged in order of magnitude. 

®This follows because with a new base the order of the items will 
probably be different, therefore, giving a new median. 

* See the medians in Table 79 which are located by interpolation. 

®This condition would obtain in the illustration in Table 79 if full 
account were taken of decimal amounts. 
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Actual Prices 

CommodUv P''^rst Tear Second Year 

A $1.00 $2.00 

B .50 25 


EBL.t.TITE PkICZS 
Fint Tear Second Tear 
100 200 

100 50 


Change measurfd by 

(1) The Arithmetic Mean (2) the Geometric Mean of 

of Relatives Relatives. 

First Tear Second Year First Tear Second Year 

Sum of 

Relatives = 200 250 

Average of 

Relatives = 2)^ 2)m VlOO X 100 = 100 V200 X 50 = 100 
100 125 

Index ’ Index 

Number = 100 125 Number = 100 lOO 


Measured by the arithmetic mean of relatives', prices rose 25 
per cent; by the geometric mean, they remained the same. 

Moreover, as pointed out by Mitchell, the geometric mean 
^hs not in danger of distortion from the asymmetrical distribu- 
tion of price variations.^^ ^ This fact is of real significance 
since distributions of price fluctuations are skewed either posi- 
tively or negatively — ^positively during periods of rising prices, 
and negatively during periods of falling prices — ^when cal- 
culated on any other than a year-to-year base.^ Accordingly, 
geometri c means are closer t o the modal ch ange thaiTare'anflP 
metic means . an d^Se^odar or typ ical^angeTs^oTI^^ 
interest when sp eakingjpf 


(2) Chain-Relatives 

The distribution of relative prices calculated on the preced- 


ing year as a base conforms more closely to the normal curve 
of error than does that made from relatives computed on 


^ Op, cit.j p. 69. 

^See op. cit.y p. 70, for a table showing the positive skewness of the 
relative prices of 1437 commodities in 1918 on the base — July, 1913, to 
June, 1914. 
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a remote fixed base. If the relative or percentage method is 
to be used to measure price change, then a near base is to be 
preferred to one that is distant. Accordingly, link-relatives, 
which are later placed in a chain, are sometimes used for this 
purpose. But it is not easy to give a precise meaning to such 
a chain except at adjoining links. I' 

When, for instance, the index number for paper prices in 
Chicago in 1921 is linked up through all of the changes from 
1913 to 1921, one is in doubt as to exactly what it measures. 
Th is method, howe ver, make it possible to drop old and t o 
add new commodities — a necessity frequentf^^^encountered 
when^inputifiinar^en^^ oT TiurnEers'd ver a period bl'years. 
But'^asT^tcheH'^ lull agreement in price charigelslaot 
to be expected by the use of the fixed and the chain base 
methods.^ 

(S) Base Shifting and the Use of Averages of Relatives 
a. When Arithmetic Averages of Relatives are Used 

In order to shift the base when arithmetic means of relatives 
are used, two methods are available: (1) recomputing the 
relatives of each commodity on the new base and averaging 
their sum — ^that is, reconstructing the number; and (2) shift- 
ing by the “short-cut^^ method. The first method gives a 
number having all the properties of the old one but expressed 
in another year as imity. The second method — ^which con- 
sists m dividing the index number for other dates by the 
figure chosen as the base — ^produces results which will not 
necessarily agree with those which would be secured if rela- 
tives were computed for each commodity on the new base. 
As Mitchell says, 

. For such recomputation usually alters considerably the rela- 
tive influence exercised upon the arithmetic means by the price 

^ Mitchell, W. C., Bulletin 284, pp. 87-89. Compare the results in 
Table 86 secured by the fixed-base-relative and the chain-relative 
methods. 
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fluctuations of certain, commodities. Those articles which are cheaper 
in the new than in the old base period get higher relative prices 
and, therefore, increased influence. Vice versa, articles that are 
dearer in the new base period get lower relative prices and, there- 
fore, diminished influence. Of course the short method of shifting 
the base, which retains the old relative prices, does not permit any 
such alteration in the influence exercised by the fluctuations of 
different commodities. Hence the two methods of shifting the base 
seldom yield precisely the same results. To present a series of 
arithmetic means shifted by the short method as showing what the 
index numbers would have been if they had been computed upon 
the new base is, therefore, misleading.”^ 


b. When Medians of Relatives Are Used 

When media ns of relati ves are used^ shifting to a n ew ba.se 
is impQSsiMe“ln5io5^ the relativ es for the indi-^ 

vidun^^commoditiesT^ 


c. When Geometric Means of Relatives Are Used 

Index numbers based upon geomet ric means of relatives can 
be shifted from base to base without error. The same result 
is secured by recompuHng^TEe^^commod relatives and by 
dividing by the new index base figure. An illustration will 
make this clear. 

Suppose the prices of two commodities were as follows: 

Actual Pnces Relative Prices (1923 = 100) 

Commodity X92S IBU ^^24 

A $1.00 $2.00 100 200 

B 1.00 .50 100 50 

Geometric - 

Means = V 100 X 100 = 100 V200 X 60 = 100 

Index 

Numbers = 100 100 

* Mitchell, W. C., “Index Numbers of Wholesale Prices in the United 
States and Foreign Countries,” Bulletin 173, United States Bureau of 
Labor Statistics, July, 1915, p, 39. See also the revision of this bulletin, 
Number 284, pp. 83-85. 

vSee the discussion of Medians of Relatives — Fiaied Base, pp. 497-498. 
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Changing the base to 1924 


(1) by recomputing the relatives and (2) by dividing by the new base figure 



1923 

192 ^ 




50 

200 

100 

100 

1928 _ 

V 100 TOO 

Vso X 200 

= 100 

VlOO X 100 = 100 

1924 “ ' 


Index 

Numbers : 

100 

100 

Index 

Numbers 1923 = 

100 , 1924 = 100 


2. EATIOS OF AVERAGE PRICES 
(1) Merits of the Method 

The ratios of arithmetic averages of actual prices — ^the units 
in which the quantities are priced being the same — do not 
have the bias inherent in arithmetic averages of relative 
prices, yet they are affected by the fact that the price for the 
same unit varies widely from commodity to commodity. For 
instance, the price in 1913 of “fine” paper is more than three 
times as important in determining the average (or the total) 
price for that year as is the price of “newspaper” for the same 
quantity. If the same proportions from year to year ob- 
tained among the different prices, the bias from this source 
would not enter. But they do not as is evident from an in- 
spection of Table 79. An “unweighted” ratio of averages index 
number accordingly is arbitrarily weighted.^ 

If the unit in which the prices were taken varied, then 
another occasion for bias would enter, because the price would 
in part depend upon the unit. For instance, if “newspaper” 
were quoted in tons, the price would be increased enormously 
and the averages for the different types be largely controlled 
by it. At least one of the early index numbers was made by 
totaling the prices of articles’ quoted in their customary com- 
mercial units. 

^ See tbe discussion of Bradstreet’s Index Number, Chapter XVI, 
pp. 523-525. 
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(B) Methods of Bme Shifting Illustrated 

Inasmuch as actual prices are averaged or totaled — ^the price 
quotations having been reduced to the same unit — ^no base 
period is involved. Any one of the years, however, may be 
chosen as a base and the average or total price for each of 
the other years be expressed as a percentage of it. More- 
over, the base can be shifted from year to year without error, 
provided the prices refer to the same source through the period. 
An illustration will show that this is the case. 

Price of Newsprint per 100 lbs. 


Jobber 

1913 

1920 

1921 

A 

$3.00 

$11.00 

$8.00 

B 

3.25 

11.20 

8,40 

C 

3.50 

1090 

8.70 

D 

2.75 

12.10 

8.30 

Total Price 

$12 50 

$45 20 

$25 40 

Average Price 

3.125 

11.30 

635 

I elatives — 

100 

3616 

203.2 


(1913 = 100) 

It is desired to shift the base from 1913 to 1921. This may 
be done (1) by expressing the prices in 1913 and in 1920 as 
percentages of the price in 1921. The results by this method 
are as follows: 

1918 1920 1921 

49.2 178.0 100,0 

or (2) by multiplying through, thus 

1920 on 1921 base, X 100 = 178.00 

2\jo 2i % 

1913 on 1920 base, X 100 = 27.65 

OUl.D 

Therefore, 1913 on 1921 = 178.0 X 27.65 = 49.2, which is the 
same result as is secured by the first method. 



504 STATISTICS AND STATISTICAL METHODS 


3. WEIGHTED AGGEEGATES OF ACTUAL PKICES AND BASE SHIFTING 

(1) Method oj Computation and Relative Merits 

The ^ recent developments in the making of index numbers 
have bee n t^ ar(r"l he use“"'o^ prices 

weighted by suitable quantises. The method consists in (1) 
applymgTonthe pncFof each commodity a quantity weight in- 
dicative of its importance, (2) totaling the products, and (3) 
expressing the results in the form of relatives on a base 
period. It was by this method that the index numbers for 
wholesale paper prices in Table 85 were computed.^ 

The advantages claimed for index numbers computed by 
this method may be summarized as follows: (1) they are 
easy to understand; (2) easy to compute; (3) do not require 
a base period for the calculation of relatives, but may be 
placed on a relative basis after the products are computed and 
totaled; (4) the base can be shifted at will without error; (5) 
they are not distorted during periods of rapid price change; 
and (6) they measure the change in the money cost of goods 
— ^the end mo4t frequently desired from the use of an index 
number. 

(^) Methods of Base Shifting Illustrated 

The claim that the base in weighted aggregates of actual 
prices can be shifted at will without error needs to be demon- 
strated. For this purpose the index numbers calculated for 
paper prices in Chicago (Table 85) may be used as an illus- 
tration. The index numbers in the table are based on 1913. 
It is desired to shift the base to 1921. This may be done by 
dividing each of the price aggregates by the amount for the 
new base year. To illustrate: The index for 1919 on 1921, 

X 100, is 90; that for 1913 on 1921, X 100, is 44. 

^ See the discussion of the index numbers of the United States Bureau 
of Labor Statistics, Chapter XVI, pp. 516-518. 
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IQQ7 Q 

With 1913 as the base, the index for 1919, -gQ^X 100, is 204. 

From the formula for 1919 on 1921, = 90, and for 

llo5.5 

1037.9 

1919 on 1913, gQg ^ = 204, it is possible to get the index for 

1913 on 1921 by simple division. Thus, -4- - = 

1155.0 508.O 

z=: That X 100 = 44, which is the index 

of 1913 on the base of 1921. 

VI. Weighting 

1. MEANING AND METHODS OF WEIGHTING 

Distinction is generally made between weighted and un- 
weighted index numbers, but often without a clear idea of 
what is meant by the terms. Every index number is weigh ted.^ 
m some form. So-called “unweighted” series are generally 
hapiazardly weighted; while in those which are termed 
weighted, the weights are selected according to some system- 
atic plan. 

If the average of relatives method is used, each item being 
counted once, the explicit^ weights are unity in each case. 
If, on the other hand, the weighted average of relatives 
method is followed, the weights are the values applied to each 
relative in order to secure the products to be averaged. On 
the other hand, if the weighted aggregate of actual prices 
method is used, the weights are the quantities which are ap- 
plied to the actual prices in order to get the products which are 
totaled into aggregates and later placed on a relative base.^ 
Weighting is effected in either of two ways: the first method . 

^Defined below. 

^To weight prices by values is illogical because the values in this 
ease are the results of multiplying quantities by prices. 
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IS in the selection of the commodities themselves — ^varying 
fiTm p]7aR?sl>eri^g iv e^^ Df 

ti mes a given article or one of the , sam e gene ral^d irin- 
eluded. Th is may be ca lled the ^hmplicif' method. The 
second way is tolTsTsom ijD ^^ evidenceFHlmpbrt ajice — 
that is, to apply ''e xpliciti ljBgeights. 

ThTe^^n^ tweights common ly assigned to retail prices in 
computing an index designed to mcajurTTE ang^"^^ cost 
of Tmni7" a^ the quantities of the articles consu^ ^”'"^ini" 
larT v^he weights applied to wholes ale prices, Jn the construc- 
tion ^^of_aaJiih[e 2 Li 0 »^^ 

total a mounts of goQ,d s.4ila ceiLoi^^ aggregate ex- 


co nsumed. val ues..excha nged computed a t the price in the year 
t he level of which is in question, etc. If the changes in prices 
which are being considered apply to securities rather than to 
commodities, then suitable explicit weights for different pur- 
poses might be the amounts outstanding, the earnings of the 
companies to which the securities apply, the dividend rates, 
etc. But the use of these different systems of weights pro- 
duces different results. So we are brought back to the ques- 
tion: What is it that weights are intended to do? 

Lack of attention to weights does not mean that weights 
are equal, but generally that they are haphazard. They are 
not necessarily bad because of this, nor good, as Mitchell 
points out, if they are consciously made. ^ The real proble m 
for the maker of index numbers is whether he shall leave 


w eigJiting to chance or see k to rationalize it. 

MoreOTer, loLcaileHlin wmgtrted ind ex"nun^ may in fact 
be markedly weighted bv^he use of ^ir^^ as, 

for instance, in the Aldrich index number, where 25 different 
varieties of pocket knives were included, thus “giving this 
trifling article an influence upon the result more than eight 
times greater than given to wheat, corn, and coal put together.” 


"^Op, cit., p. 60 . 
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Truly to give each commodity equal weight requires care- 
ful and studied attention to the choosing of positive 
weights. 

But what test or tests of importance are available? Are 
they applicable at all times and places, and for all purposes? 
To weight a retail price index number — where the purpose of 
its computation is to measure the effect of price change on 
consumers — by the amount of production or by the value of 
the articles exchanged is ill fitting. Likewise, to weight whole- 
sale prices by statistics of family consumption is illogical. 
We ights should be appropriate or they should be dispense d 
with entirely. 

t^fTTE-Trelation of weights to purposes of index numbers, 
Mitchell says: 

'If rational weighting is worth striving after, then by what method 
shall the weights of the different commodities be arrived at? That 
depends upon the object of the investigation. If, for example, the 
aim be to measure changes in the cost of living, and the data be 
retail quotations of consumers’ commodities, then the proportionate 
expenditures upon the different articles as represented by collections 
of family budgets make appropriate weights. If the aim be to study 
changes in the money incomes of farmers, then the data should be 
'farm prices,’ the list of commodities should be limited to farm 
products, and the weights should be proportionate to the total money 
receipts from the several products. If the aim be to construct a 
'business barometer,’ the data should be prices from the most rep- 
resentative wholesale markets, the list should be confined to com- 
modities whose prices are most sensitive to changes in business 
prospects and least liable to change from other causes, and the 
weights may logically be adjusted to the relative faithfulness with 
which the quotations included reflect business conditions. If the 
aun be merely to find the differences of price fluctuation character- 
istic of dissimilar groups of commodities, or to study the influence 
of gold production or the issue of irredeemable paper money upon 
the way m which prices change, it may be appropriate to strike a 
simple arithmetic average of relative prices If, on the other hand, 
the aim be to make a general-purpose index number of wholesale 
prices, the question is less easy to answer ” 

* Op, cit, pp, 62-63. 
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But why use weights at all, when weighted results are so 
strikingly the same as unweighted? Two main reasons are 
usually assigned for ignoring them: (1) the diflSiculty of finding 
suitable weights and of currently correcting them, and (2) the 
fact that unweighted series are almost identical with those 
-which are weighted^ Bowley, in much quoted passages, says: 

'The discussion of the proper weight to be used . . . has oc- 
cupied a space in statistical literature out of all proportion to its 
significance, for it may be said at once that no great importance 
need be attached to the special choice of weights; one of the most 
convenient facts of statistical theory is that, given certain condi- 
tions, the same result is obtained whatever logical system of weights 
is applied/^ ^ 

"So we arrive at a very important precept ; in calculating averages 
give all care to making the items free from bias, and do not strain 
after exactness in weighting 

' But this is hardly a full state ment o f the c ase. Properly^ to 
.weight a num ber is to ma]Eeit^TreeTrqm]bi^^^^ be 

done by assigning weigEtsToTEe samples at hand or by the 
more direct, but sometimes more difficult, method of choosing 
more samples. In reality the two are alternatives, with this 
difference that errors in prices will probably tend more nearly 
to be compensating than those in weights. If a rational system 
of weights does not change the result of an “unweighted^’ 
average, then weights may be dispensed with ; if it does, then 
they ought to be used. 

While the problem of selecting weights lends itself to theo- 
retical discussion, it is primarily of practical concern. To the 
person who desires to use index numbers the questiop can- 
not be dismissed with the assertion that if weights are chosen 
according to chance, weighted and unweighted indexes closely 
agree. As they are computed, weights are not always so 
chosen, numbers differ materially, and the merits of un- 
weighted and weighted numbers can be determined only by 

* Bowley, A. L., Elements of Statistics, 2d Ed., 1902, p. 113. 

* md,, p. 118. 
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comparison.^ In the light of the differences shown in this 
manner the merits of the two types of series must be deter- 
mined. The student and the business man cannot readily make 
these comparisons for themselves but they can be familiar 
with those that have been made. That “amiable weakness 
to take upon faith plausible figures that fill a pressing want^^ 
would not then be so common. 

Should weights be fixed or fluctuating? By changing them a 
more accurate mea^ure^ofimpi^Eanc'i^ acquired, 

but changes in an index must then be interpreted not only in 
terms of prices but also in terms of weights. Conceivably, 
some sort of an average of relative importance over a period 
could be used, but if so, the variations would be lost sight of. 
\^hen chain-indexes are used, weights can be varied without 
‘ I confusion, since price changes from year to year only are 
measured. Such figures do not accurately measure changes 
•over a period. ^ 


2, WEIGHTING IN PROFESSOR FISHER’s “IDEAL'^ FORMULA 

Professor Fishen ^ by an elaborate analysis of the types of 
bias by which index numbers computed from averages of 
relatives of different kinds, and from aggregates of actual 
prices are affected, co ncludes that a scheme of cross weigh t- 
in g should be j a§ed. Ln this manner he claims to overcom e 
types of bias bv which prices and quantities are affected. 
He writes his formula as follows: „ 

I' / SpigA j ■ 

^Weighted and unweighted series, and those weighted in various ways 
both for commodities and stocks, are elaborately compared by Mitchell, 
Wesley O., in “Critique of Index Numbers of Prices of Stocks” in The 
Journal of Political Economy, July, 1916, passim; and Bulletin of the 
United States Burea/a of Labor Statistios, Whole Number 173, pp. 74-75. 
See also Fisher, Irving, op. cit., where the effects of applying weights 
are worked out in great detail. 

* Fisher. Irving, op. oit., passim. 
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where 2 = 'The sum of such terms as^’ 

Pi = the price of any commodity in a given year or 
other period. 

gi = the quantity of the commodity in the given years 
or other period. 

Po =: the price of any commodity in the base year or 
other period. 

qo = the quantity of that commodity in the base year 
or other period. 

This formula requires both price and quantity (weights) 
for each year to which an index applies. As will be noted, 
there are four sets of aggregates required: (1) prices in the 
given year multiplied by quantities in the base year; (2) 
prices in the given year times quantities in the given year; 
(3) prices in the base year times quantities in the base year; 
and (4) prices in the base year times quantities in the given 
year. The first and second aggregates are divided by the 
third and fourth aggregates, respectively, giving two relatives 
which are then multiplied together, and the square root of 
the product extracted. 

In this formula — Fisher calls it the 'TdeaP^ because it most 
fully neutralizes the types of bias which he finds in measuring 
changes in prices and in quantities — ^the form of weighting 
is designed so that the index number secured will meet two 
basic tests: viz., "time reversal” and "factor reversal.” The 
time reversal test Fisher describes as follows: 

'"The test is that the formula for calculating an index number 
should be such that it will give the same ratio between one pomt 
of comparison and the other, no matter which of the two is taken as 
the base. 

f^Or^ putting it another way, the index number reckoned forward 
should be the reciprocal of that reckoned backward.” ^ 

By this he means that if an index shows that between 1913 
and 1920, for instance, prices doubled, then it should show 

* Op. cit.f p. 64. 
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that the level in 1913 was one-half of that in 1920 when meas- 
ured from the latter year. 

Concerning the test he says: 

‘^Just as our formula should permit the interchange of the two 
times without giving inconsistent results, so it ought to permit 
interchanging the prices and quantities without giving inconsistent 
results — i.e.y the two results multiplied together should give the 
true value ratio.’^ ^ 

It is unnecessary to enter into a discussion of the merits of 
this particular formula, or the question as to whether there 
is one formula which is best^ — “ideal” — for all purposes.^ It 
sufBces for our purposes to call attention again to the fact 
that the peculiar cross W’eighting is advised largely because it 
equalizes different types of bias, thus definitely associating 
rather than contrasting “making the items free from bias” 
and “straining after exactness in weighting.” 

All index numbers are no longer considered to be equally 
good. Study of the methods of their construction, of the price 
fluctuations of different types of commodities, of bias, etc., 
has made the maker of index numbers critical. He is no 
longer satisfied with the crude methods of yesterday in the 
face of the specific findings of such students as Mitchell and 
Fisher. How about the attitude of the user? He is not so crit- 
ical, but he should be. After all, it is he who applies the num- 
bers to the different problems which he has to solve. It may be 
worth while, therefore, to offer in brief form some suggestions 
which will help him to make a discriminating application. 

VIL Suggestions to Users op Price Index Numbers 

1. Before applying index numbers to specific problems, clearly 
formulate a statement of the use which you have in mind. 
cit.f p. 72. 

*Its form, ease of calculation, suitability to different purposes, etc., 
have been the subject of vigorous controversy. See, for instance, Eisher, 
op. eit.^ passim; Persons, W. M., Review of Economic Statistics, Pre- 
liminary, Volume 3, pp. 103-113 (May, 1921) ; Mitchell, op. cit , pp. 91- 
93, and the references given. 
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In doing this it will be necessary to distinguish, among 
other things, between 

(1) a general and a specific use. 

(2) different general uses. 

(3) different specific uses. 

2. Distinguish between index numbers designed to measure 

changes in the prices of 

(1) commodities sold at wholesale and at retail. 

(2) manufactured products and raw materials. 

(3) basic commodities in central markets, and farm 

products, for instance. 

(4) foods and the ^Total cost of living.^^ 

(5) goods with business ^^barometric” significance and 

those relating, for instance, to consumption. 

3. Observe the methods according to which index numbers are 

constructed, paying special attention to 

(1) the kinds of commodities included. 

(2) the source of information on prices. 

(3) the nature of the prices — as market, contract, im- 

port and export. 

(4) the number of commodities. 

(5) the kinds of weights used, and the source of in- 

formation. 

(6) the periods to which the index applies. 

(7) the base period, if any. 

(8) the type of average used, as arithmetic mean, 

median, geometric mean. 

4. Avoid 

(1) shifting the base by the ‘^short-cut” method when 

arithmetic means and medians of relatives are 
used. 

(2) confusing long and short period price trends. 

(3) confusing numbers which measure average ratios 

of change in price, and average change in amount 
of money required to buy a bill of goods. 
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(4) confusing an index number of price and its recipro- 
cal, the purchasing power of the dollar. 

5. Choose the index number which most fully meets the needs 
in your particular case, but do not use it blindly. An. 
index number shows what it shows and nothing elsa 
What this is should and can be known by the user.^X 

VIII. Conclusion 

In this chapter our aim has been (1) to show by concrete 
examples the different methods of constructing index numbers, 

(2) to explain and briefly to criticize each of the methods, and 

(3) to offer some helpful suggestions to users of index num- 
bers. Little more, however, has been done than to touch upon 
the more important phases of the subject. Students should 
consult the painstaking studies of Fisher, Mitchell, and others 
if they wish really to understand the subject. 

This chapter is not a critique, but rather an exposition of 
the principles upon which a critique must be based. If an 
interest in index number making and using has been aroused, 
the main purpose of what has been written here will have 
been accomplished. After all, chief reliance must be placed 
in the scientiflc spirit and integrity of both maker and user. If 
these are lacking, the use of statistics is without a logical 
defense. 
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CHAPTER XVI 


PRICE, QUANTITY, AND GENERAL BUSINESS 
INDEXES DESCRIBED AND COMPARED 

I, Intboduction 

The purpose of the preceding chapter was to illustrate the 
different methods by which index numbers of prices and of 
other phenomena may be computed, and to discuss the prin- 
ciples involved. The purpose of this one is to describe and 
compare the methods used in the more important public and 
private series. 

The treatment is of necessity brief. It includes only an 
outline of the methods peculiar to each type. While the facts 
presented are for the most part readily available, they are 
not generally kept in mind when index numbers are used. 
It may be helpful to the reader, therefore, to have at hand 
a brief account of the more important series. 

II. Index Numbee of Prices 

American commodity ^ price index numbers may be divided 
into two groups: (1) those prepared by agencies of the United 
States Government, and (2) those issued by private organiza- 
tions. The more commonly known indexes from both sources 
are described in what follows: 

^ Excellent summaries under the headings, among others — ^history, 
source of quotations, base period, number and class of commodities, 
grouping, weighting, etc. — of foreign price index numbers are contained 
in Bulletin 284 of the United States Bureau of Labor StatishcSf pp. 175 
336 . 


5W 
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1, PBICE INDEX NITMBEKS ISSUED BY THE VmTED STATES 
GOVERNMENT 

(1) Index Numbers of Wholesale Prices 

a. The United States Bureau of Labor Statistics’ Wholesale 
Price Index Number ^ 

The systematic publication of a wholesale price index 
number by the United States Government was begun in 1902. 
The period first covered was 1890 to 1901, inclusive. This 
number was in continuation of the index compiled by the 
Department of Labor for the period 1890 to 1899, but included 
somewhat different commodities and carried the computations 
back to 1890. Since then, monthly and annual numbers have 
appeared regularly. 

Up to and including 1913, the index number was an average 
of relatives based upon the average price, 1890-1899. In* 
1914 a change was made to an aggregate of actual prices 
weighted according to the amount of goods placed on the 
market in 1909. The weights now used are the amounts of 
goods marketed in 1919. 

The change from an average of relatives to a weighted 
aggregate of actual prices method was made primarily because 
of (1) the difficulty of changing the base in averages of rela- 
tives without entirely recomputing the series; (2) a realiza- 
tion that an arithmetic average of relatives does not accurately 
measure typical price changes, more especially during periods 
of rapidly rising prices; ^ and (3) the conviction that a price 
series built up from actual money prices shows most accurately 
what the Bureau wanted to show — changes in the cost of ^^an 
unvarying market basket.” 

The important details about the method now used by the 

‘For a complete description of this index, see Bulletin of the United 
States Bureau of Labor Statistics No. 326, Washington, D. 0., March, 
1923. 

* 8ee the discussion of Dispersion of Price Fluctuations^ supra, p. 489 ff 
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Bureau of Labor Statistics in computing its wholesale price 
index number are as follows: 

(a) The Price Quotations 

Prices of 450 commodities, obtained primarly from trade 
journals, manufacturers, sales agents, trade bodies, etc., are 
collected systematically and regularly by the Bureau. Contact 
with the trade, a carefully prepared system of record cards 
providing methods for establishing the identity of commodi- 
ties, and editorial care guarantee substantial accuracy of the 
prices secured. So far as possible, the quotations are secured 
weekly from primary markets. 

(b) Types and Grouping of Commodities 

The 450 commodity quotations are divided into the follow- 
ing groups — ^the numbers in parentheses representing the pro- 
portions falling in each group: farm products (12.4) ; foods 
(23.3); cloths and clothing (15.6); fuel and lighting (4.4); 
metals and metal products (11.8); building materials (10.4); 
chemicals and drugs (9.6) ; house furnishings (6.9) ; miscel- 
laneous^ (5.6). 

(c) The Method of Calculating the Index 

The average price ^ of each article for each year — 404 rather 
than 450 are used in the index — ^is multiplied by the estimated 
quantity of the article marketed in the census year 1919 — 
the amount in each case being checked against all available 
information. The products for the different commodities 
obtained in this manner are then added together. These dif- 
ferent computations give a series of values from which the index 
number for each year is calculated as a relative or percentage 
number, the value for 1913 being taken as a base or 100. 

* Cattle feed, leather, paper and pulp, other miscellaneous. 

* Average yearly prices are built up from average weekly and monthly 
prices. 
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(d) The Form and Place of Publication 

Monthly and annual index numbers for the commodity 
groups separately and combined, and reduced to relatives on 
the base, 1913, appear in The Monthly Labor Review, and in 
Wholesale Prices, both issued by the Bureau of Labor Statis- 
tics, Washington, D. C. 

b. The Federal Reserve Board^s Wholesale Price Index 
Number ^ 

An index number of wholesale prices has been prepared by 
the Federal Reserve Board since October, 1918 — ^the series 
being computed back to 1913. 

(a) The Price Quotations 

The price quotations are the same as those used by the 
United States Bureau of Labor Statistics in its wholesale series. 

(b) Types and Grouping of Commodities 

The commodities used are the same as those which make 
up the wholesale index of the United States Bureau of Labor 
Statistics, but they are grouped into three major classes, as 
follows: (1) raw materials, this group being further divided 
into farm products, animal products, forest products, and 
mineral products; (2) producers^ goods; and (3) consumers' 
goods. 

(cj The Method of Calculating the Index 

The method of calculation is the same as that used by the 
Bureau of Labor Statistics, that is, a weighted aggregate 
of actual prices, the weights being the estimated quantities of 
goods marketed in 1919. 

^For a complete description of this index number see Bulletin 284 of 
the United States Bureau of Labor Statistics, Washington, D. O., Octo- 
ber, 1921, pp. 13.S-135. 
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(d) The Form and Place of Publication 

Monthly and annual index numbers by commodity groups 
reduced to relatives on the base, 1913, appear monthly in The 
Federal Reserve Bulletin, Federal Reserve Board, Washington, 
D. C. 

c. The United States Department of Agriculture’s Wholesale 
Price Index Number of Farm Prices of Crops and of 
Livestock ^ 

(a) The Price Quotations 

The prices of the 30 commodities used in this index are 
those paid to producers as reported to the Division of Crop 
and Market Estimates of the Department. The prices refer 
to the 16th of each month. 

(b) The Types and Grouping of Commodities 

The prices cover (1) grains, (2) fruits and vegetables, (3) 
meat animals, (4) dairy and poultry, (5) cotton and cotton- 
seed, and (6) unclassified. 

(c) The Method of Calculation 

An average price for each commodity for the period August 
1909 to July 1914 is determined. The price for each com- 
modity is then multiplied by the average quantity of the corre- 
sponding commodity marketed in the period 1918 to 1923, and 
the resulting values added together to form an aggregate value 
for the base period. This is taken as 100. Similar aggregates 
are computed for each month and year, and expressed as 
relatives or percentages of the aggregates of the base period. 

Ml description of this index is contained in Crops and Markets^ 
Monthly Supplement, United States Department of Agriculture, August, 
1924, p. 285. 
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(d) The Form and Place of Publication 

This index number by months and years and by groups of 
commodities appears in the Monthly Supplement, Crops and 
Markets of the Department. 

(2) Index Numbers of Retail Prices 

If the collection of price data as a basis for the computation 
of a wholesale price index presents difficulties, as it undoubt- 
edly does, these are many times more serious in the case of 
price data for a retail price index. While retail prices may 
change more slowly than wholesale prices, may be less affected 
by trade disturbances, and may move further in either direc- 
tion after they are disturbed and be slower to regain their 
former position, it is these conditions and others, which make 
it so difficult to procure satisfactory price data over a period 
of time so as to measure the changes actually taking place 
Prices of some commodities fluctuate from day to day; others 
less susceptible to conditions of demand and supply show ap- 
preciable change within somewhat longer periods. Prices of 
the same commodity vary materially as between localities. 
Some commodities, standard in character, but peculiar to local 
markets and not possessing distinctive trade names, sell at 
widely different prices at the same time. 

a. The United States Bureau of Labor Statistics^ Index 
Number of Food Prices 
(a) The Price Quotations 

From 1890 to 1907, the Bureau used 30 commodities. From 
1907 to 1913, this number was reduced to 15, and in 1914 and 
1915, respectively, the number was 17 and 21. Forty-three 
products are now used. 

Prices of these commodities, on the 15th of each month in 
most cases, are secured from retailers in 51 cities of the 
United States. The prices are taken as representative of food 
products generally. 
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(b) Types of Commodities 

The 43 articles are distributed into 22 groups for the purpose 
of computing index numbers of price change. 

(c) The Method of Calculating the Index 

From the monthly quotations, the Bureau computes an 
average pnce for each article in each city, and in the 51 cities 
combined. From these, relative prices or index numbers are 
computed for each article on the 1913 base price. For the 
index numbers showing prices in a city and for the United 
States as a whole, the prices are weighted according to the quan- 
tity of each article consumed by an average family during 
one year. The consumption weights (quantities) were secured 
from a comprehensive study made by the Bureau in 1918-1919. 

(d) The Form and Place of Publication 

Index numbers showing changes in food prices for groups of 
commodities, and for ail articles combined, for the country as 
a whole appear in the Monthly Labor Review. From timp to 
time, they are also shown separately by cities. 

b. The United States Bureau of Labor Statistics’ Index 
Number of Cost of Living 

An index number showing the “changes in the cost of liv- 
ing” has been published by the Bureau of Labor Statistics 
since 1918, although the data go back to December, 1914. 
This index is a composite of the changes in prices of things 
which make up the “cost of living.” 

(a) The Price Quotations 

The price quotations refer to commodities consumed by 
workingmen’s families, and are taken from representative firms 
and districts in industrial centers. Some of the quotations are 
submitted to the Bureau by storekeepers, while in other cases 
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the Bureau^s field agents collect the necessary data. The 
problem of keeping the identity of commodities the same is 
difficult, but essential uniformity is obtained by careful com- 
parisons of grades, and by the Bureau^s specifying in detail the 
qualities of the articles involved. 

(b) The Types and Grouping of Commodities 

Prices are secured for six types of commodities or services: 
(1) food, (2) clothing, (3) rent, (4) fuel and light, (5) furni- 
ture and furnishings, and (6) miscellaneous items. 

(c) Method of Calculating the Index 

The average price of each article in each group — as food, 
clothing, etc. — is multiplied by a weight showing the quantity 
of the article consumed by a family in a year. The products 
are then totaled. The sums give the value of all of the articles 
in the group at the different periods to which the prices apply. 
In order to get a measure of the change in the price for the 
group from period to period, 1913 is selected as’ a base, or 100 
per cent, in terms of which the values for other periods are 
expressed as percentages. The percentage changes in each of 
the groups are then weighted by factors according to their 
relative importance in the family budget, weights being based 
upon the result of a study of more than 12,000 family budgets 
in 92 localities in the United States.^ 

(d) The Form and Place of Publication 

Changes in cost of living for the country as a whole and for 

* The group weights are as follows : food, 38 2 per cent ; clothing, 16.6 
per cent ; rent, 13.4 per cent ; fuel and light, 5.3 per cent ; furniture and fur- 
nishings, 5.1 per cent ; and miscellaneous, 21.3 per cent. The National In- 
dustrial Conference Board, New York City, publishes a similar index 
number of cost of living, the group weights being as follows: for food, 
43.1 per cent; for shelter, 17.7 per cent; for clothing, 13,2 per cent; 
for fuel and light, 5.6 per cent; for sundries, 20.4 per cent. See Carr, 
Elma, “Cost of Living Statistics of the United States Bureau of Labor 
Statistics, and (of) the National Industrial Conference Board,” Jouriml 
of the American Statutical Association, December, 1924, pp. 484-507. 
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different cities are published in the Monthly Labor Review ^ 
United States Bureau of Labor Statistics. 

2. WHOLESALE PRICE INDEX NUMBERS ISSUED BY PRIVATE 
ORGANIZATIONS 

A number of private organizations in the United States pre- 
pare index numbers of wholesale prices. These originally gre'w 
out of some particular need or were designed for some special 
purpose in connection with market analysis, special trade or 
financial publications, etc. While, in general, less is known 
about them than about the public series prepared by govern- 
mental agencies, they are widely used, quoted, and relied upon 
to measure price changes. Those best known are briefly de- 
scribed below. 


(i ) Bradstreetfs Index Number ^ 

BradstreeUs wholesale index number is published monthly 
as a total price of 96 articles reduced to a per-pound basis. 


a. The Price Quotations 

Little is known about the source of the quotations but the 
compilers say they are secured from central markets. 

b. The Types and Grouping of Commodities 

The articles are divided into 13 groups as follows: (1) 
breadstuffs, (2) live stock, (3) provisions and groceries, (4) 
fresh and dried fruits, (5) hides and leather, (6) raw and 
manufactured textiles, (7) metals, (8) coal and coke, (9) 
mineral and vegetable oils, (10) naval stores, (11) building 
materials, (12) chemicals and drugs, and (13) miscellaneous. 

^ See “Comparison of Methods Used in Constructing Index Numbers of 
Wholesale Prices/^ Monthly Labor Review^ September, 1920, pp. 65-70. 
This is a comparison of the methods used by the Bureau of Labor Btch 
tisUcSr the Afmalist, BradstreeL and Dun. 



524 STATISTICS AND STATISTICAL METHODS 


c. Method of Calculating the Index 

The index number for each of the thirteen groups is the sum 
in dollars and cents of the average price per pound of the 
articles included. The index for all of the commodities is the 
sum of the indexes for the groups, and the yearly number 
the average of the monthly numbers. No base is used, and it 
is not clear from the descriptions contained in Bradstreefs 
whether the prices are averages of extremes or something else. 
Moreover, the sources of the quotations are not disclosed, nor 
is the method described by which interpolations are made for 
missing data. 

Weights are not used, except as they appear in the process 
of reducing all quantities to a price-per-pound basis. This, of 
course, results in employing a — 

. . curious combination of rational and irrational weights. The 
rational element consists in the mclusion of several quotations for 
important articles like pig iron, coal, lumber, and hog products, 
and only one quotation for articles like lemons, tea, and flax. The 
irrational element results from the reduction of all the origmal 
quotations to prices per pound On April 1, 1897, these prices per 
pound ranged from $0.0008 for soft coal and coke to $0.52 for quick- 
silver and $0.83 for rubber. Recognition of the excessive mfluence 
upon the results accorded to these high-priced articles presently led 
the computers to drop them from the index number; but they seem 
to have retained articles like alcohol and Australian wool which in 
1897 cost $0.33 and $0.49 per pound — 400 and 600 times as much 
as soft coal and coke.”'^ 

d. The Form and Place of Publication 

The index is published in Bradstreetfs both as monthly and 

^Bulletin of the United Stades Bureau of Labor statistics, Whole 
Number 173, p. 101. Another writer in speaking of Bradstreet’s 
method of weighting, says, “Illogical as this system may seem, however, 
it does not give the erratic results one might expect, because it is in 
part negatived by varying the number of commodities of each group ; 
that is, few commodities are used in those classes of goods having high 
values per pound, while many are used where value per pound is low. 
The Bradstreet’s weighting system, then, while on its face almost ridic- 
ulous, is not nearly so bad as it looks.’^ 
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as annual numbers, the edition shortly after the beginning of 
each year giving a convenient review by years, months, and 
groups of commodities. 

(£) Duties Index Number’^ 

a. The Price Quotations 

Dun^s index number is based upon the wholesale prices of 
about 200 commodities ^ taken from the principal markets of 
the United States. 

b. The Types and Grouping of Commodities 

The commodities included are divided into the following 
groups: ^ (1) breadstuffs, (2) meats, (3) dairy and garden 

products, (4) other food, (5) clothing, (6) metals, (7) miscel- 
laneous. 


c. The Method of Calculating the Index 

The index numbers are computed by (1) multiplying the 
price of each article by the annual per capita consumption, 
(2) totaling the products in each group to give the group 
index, and (3) totaling the group indexes to get the total 
index number. Concerning the method used, Dun^s Review 
of May 9, 1914, says: 

^ See reference in note 1, p. 523. 

* In a pamphlet entitled “Commodity Prices, a Eecord Covering a 
Period of Over Half a Century,” taken from Dun^s BevteWy January 1, 
1919, it is said that “about 300 wholesale quotations are taken.” 

* “Breadstuffs include quotations of wheat, corn, oats, rye, and barley, 
besides beans and peas , meats include live hogs, beef, sheep, and various 
provisions, lard, tallow, etc.; dairy and garden include butter, eggs, 
vegetables and fruits ; other foods include fish, condiments, sugar, rice, 
tobacco, etc. ; clothing includes the raw material of each industry, and 
quotations of woolen, cotton and other textile goods, as well as hides 
and leather; metals include various quotations of pig iron, and partially 
manufactured and finished products, as well as minor metals, coal and 
petroleum. The miscellaneous class embraces many grades of lumber, 
and also lath, brick, lime, glass, turpentine, hemp, linseed oil, paints, 
fertilizers and drugs.” — Dun*s Review, January 10, 1925, p. 11. 
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^'Quotations of all the necessaries of life are taken and in each 
case the price is multiplied by the annual per capita consumption, 
which precludes any one commodity having more than its proper 
weight in the aggregate. Thus, wide fluctuations in the price of an 
article little used do not materially affect the 'index,' but changes 
in the great staples have a large influence in advancing or depress- 
ing the total. . . . The per capita consumption used to multiply 
each of many hundreds of commodities does not change. There 
appears to be much confusion on this pomt, but it should be seen 
at a glance that there would be no accurate record of the course of 
prices if the ratio of consumption changed. It was possible, how- 
ever, to obtam figures sufficiently accurate to give each commodity 
its proper importance in the compilation. This was done by takiug 
averages for a period of years when busmess conditions were normal 
and every available trade record was utilized, in addition to official 
statistics of agriculture, foreign commerce, and census returns of 
manufactures." 

The characteristics of this index number are further de- 
scribed by Dun/s Review of January 10, 1925, as follows: 

"It is timely to point out . . . that wholesale quotations only are 
used as a basis for the figures given, no attempt having been made 
here to measure the fluctuations in retail prices. The latter usually 
vary so considerably m different sections of the same city that sat- 
isfactory comparisons are difficult, if not impracticable. Nearly all 
barometers of price trends are based on wholesale quotations, and 
Dun’s Index Number has the scientific foundation of making al- 
lowance for the relative importance of each of the many items that 
comprise the record. Obviously, some commodities enter more 
largely into consumption than others, and in computing an index 
number, a distinction should be made between a staple that is widely 
consumed and another article the per capita consumption of which 
IS small. In an index number where such an allowance is not made, 
it follows that some articles will have a disproportionate influence 
upon the total, while others will not have their proper weight in 
the general result." 

d. The Form and Place of Publication 

This number appears regularly in Review, New York. 
In the annual number, convenient summaries are given, showing 
price changes for commodities by groups, by months and years* 
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(S) The New York Annalist^ $ Index Number^ 

The AnnaMsty a New York financial journal, computes a 
wholesale price index number based upon 25 food products. 
In the issue for January 5, 1925, this number is described 
as showing 'The food cost of living.” 

a. The Price Quotations 

The quotations are taken from Chicago and New York 
markets and are chosen, it is claimed, so as to represent a theo- 
retical family budget. 

b. The Types of Commodities 

The following commodities are included: steers, hogs, sheep, 
beef (fresh), mutton (dressed) , beef (salt), pork (salt) , bacon, 
codfish (salt), lard, potatoes, beans, flour (rye), flour (wheat, 
spring), flour (wheat, winter), corn meal, rice, oats, apples 
(evaporated), prunes, butter (creamery), butter (dairy), 
cheese, coffee, sugar (granulated). 

c. The Method of Calculating the Index 

The Annalist index number is an average of relatives, the 
steps in its computation being (1) to express the price of each 
article each period as a relative with its average price 1890- 
1899 as a base, (2) to sum the relatives, and (3) to take an 
arithmetic mean. No explicit weighting is used — ^the different 
commodities affecting the result in proportion to their relative 
increase or decrease as' compared to the base period.^ 

d. The Form and Place of Publication 

Weekly, monthly, and yearly numbers in the form of rela- 
tives are published currently in the weekly numbers of the 
journal. 

^ See reference in note 1, p. 523. 

*See the criticism of this method, mpra, pp. 480-493, 497. 
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(4) Professor FishePs Index Number 

Professor Irving Fisher of Yale University publishes weekly 
through a syndicate of American newspapers an index number 
of wholesale prices in the United States, and its reciprocal 
the purchasing power of the dollar. The series was begun in 
the first week of January, 1923, a number each week from that 
time to date being available. 

a. The Price Quotations 

The quotations are taken from Dun^s Review. In the be- 
ginning, 200 commodities were used; recently, however, this 
number has been increased to 205. 

b. The Types of Commodities 

The 205 commodities may be distributed in the following 
groups (the numbers in parentheses showing the percentage 
of the total in each group) : food (45.9) ; clothing and cloths 

(16.9) ; paper, rubber, and fibers (2.3) ; metals (9.5) ; fuels 

(15.9) ; building materials (5.9); chemicals (3.6). Separate 
indexes for the groups, however, are not published. 

c. The Method of Calculating the Index 

The method of calculation is now as follows: the price 
of each article is multiplied by the quantity of that article 
sold in 1919 — ^the United States Bureau of Labor quantities 
being used. The sums of the products for each week, month, 
or year, therefore, may be thought of as giving the total value 
of the articles sold at prices for the period and in quantities 
corresponding to those in 1919. The index numbers, however, 
are issued as relative or percentage numbers with 1913 as the 
base.^ In this form they show ^The relative value, from week 

order to put the articles on the 1913 base, the 1923 series was 
equated on the basis of the Bureau of Labor index number (156 = 1913) for 
the week ending November 17, 1922. With the change in Fisher’s number 
made in 1924, a further equating was necessary. His 1924 series is equat- 
ed to his own number (151.9) for the week ending November 16. 19^. 
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to week, of a cargo of the 205 commodities in the above 
specified quantities/^ ^ 

Previous to the 1924 revision, class weights were also used. 
These were chosen because it was impossible to get sufficient 
quotations for some of the commodities. Accordingly, cor- 
rection factors or class weights’ were applied to the quantity 
weights. These, however, have been dispensed with in the 
1924 revision except in the case of the chemical group. The 
quantity weights in this group are increased by one-half. 

d. The Form and Place of Publication ^ 

Fisher’s series is published each Monday morning in the 
more important metropolitan newspapers. It appears in two 
forms: (1) as relative numbers based on 1913, and (2) as 
cents showing the purchasing power of the 1913 dollar. The 
second series of amounts are gotten by dividing the first 
series into one and multiplying by 100. That is, they are the 
reciprocals of the relatives. 

(5) The Commodity Price Index of BiLsiness Cycles of the 
Harvard Committee on Economic Research^ 

The purpose of this index of wholesale prices is to measure 
changes in general business conditions. It is not intended to 
measure changes in the level of prices nor the effect of the 
changes on cost of living — ^the two purposes for which index 
numbers are generally computed.^ 

^Fisher, Irving, “Revision of the Weekly Index Number,” Journal of 
the America/n Statistical Association, September, 1924, pp. 336-347 at 
p. 340, The reference in the quoted part is to the individual quantity 
weights corresponding to the different commodities. 

“The list of commodities used in 1923 and in 1924 together with the 
quantity weights are shown in Fisher, op. (nt., pp. 341-343. This article 
also explains in detail the method followed including the adjustments 
made in 1924. 

“This index is fully described in Persons, W. M., and Coyle, Eunice S., 
“A Commodity Price Index of Business Cycles,” The Review of Eco- 
nomic Statistics, Preliminary Volume 3, Number II, November, 1921, 
pp. 353 to 369, 

** See supra, pp. 480-481. 
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a. The Price Quotations 

From an analysis of the fluctuations in the prices of a large 
number of commodities, 10 ^Varied in nature, important in 
industry, unusually sensitive in price, not greatly affected by 
the seasons, and similar with respect to their main cyclical 
price movements” ^ were selected. 

b. The Types of Commodities 

The commodities used are as follows: (1) cottonseed oil, 
(2) coke, (3) spelter, (4) pig iron, (5) bar iron, (6) mess pork, 
(7) hides, (8) print cloths, (9) sheetings, and (10) worsted 
yarns. 

^Instead of including a large number of commodities, a few of 
which have great influence but most of which have little influence 
on the result, it is better for our purpose to mclude a limited number 
of carefully selected commodities with homogeneous cyclical price 
movements ” * 


c. The Method of Calculating the Index 

The method of calculating the index is to take an un- 
weighted geometric mean of the prices of the 10 commodities 
relative to their geometric average price in the base period, 
1890-1899. 


d. The Form and Place of Publication 

Monthly and annual index numbers of business cycles from 
1890 to September, 1921, are contained in The Review of 
Economic Statistics,^ and current numbers in Statistical 
Record, Harvard Economic Service, Cambridge, Mass, 

III. Index Numbers of Production 
During the World War it became apparent that index num- 

* Persons, W. M., and Coyle, Eunice S., loc, oit,^ p. 353. 

*Ojp. citf p. 356. 

* Loc. dt.f p. 369. 
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bars of price changes did not truly represent changes in indus- 
trial and business conditions. The 'Mollar^^ became a variable 
rather than a fixed standard. Accordingly, the need for some 
measure of change in quantities of things produced, exchanged, 
and sold was supplied by the calculation of a number of 
indexes of production. 

Among these indexes, those prepared by Stewart,^ King,^ 
Snyder,® and others' were significant. Somewhat later, Profes- 
sor E. E. Day, of the Harvard Committee on Economic Re- 
search, prepared quantity indexes for agriculture, manufactur- 
ing, and mining, separately and combined. The methods 
used in these indexes are briefly described below. 

1. THE INDEX OP PHYSICAL PRODUCTION OP THE HARVARD 
COMMITTEE ON ECONOMIC RESEARCH 

(1) Index of Agricultural Production^ 
a. Quantity Data 

The annual amounts of production of twelve crops are used, 
the data being drawn from records of the Department of Agri- 
culture, supplemented by similar data from other sources. 

b. Types of Commodities 

For the original index, which covered the period from 1879- 
1920, the annual amounts of production of the following com- 
modities were used: hay, corn, oats, wheat, barley, rye, rice, 
white potatoes, sugar, tobacco, cotton, and flaxseed 

^ Stewart, W. W., “An Index of Production,” The American Ecommic 
Remew, March, 1921, pp. 57-81. 

“King, W. I., Bankers’ Statistics Corporation, Special Service, Vol. 2, 
No. 12, August 24, 1920. 

“Snyder, Carl (not published). See, however, Income in the United 
States, National Bureau of Economic Research, Harcourt Brace, New 
York, 1921, p. 79. 

^Por a detailed description of this index see Day, E. E., “An Index 
of the Physical Volume of Production,” The Review of Economic Sta^ 
tistics, (Reprinted from the September, 1920 — January, 1921, numbers, 
pp. 1-14). 
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c. The Method of Calculating the Index 

Two indexes are constructed — a so-called '^unadjusted,” and 
an "adjusted” index. 

The unadjusted index is calculated as follows: (1) the 
quantity each year for each commodity is expressed as a 
relative of the amount in the base period 1909 to 1913; (2) 
the relatives are weighted by the average annual values of 
the individual crops in the same base period, 1909-1913; and 
(3) a weighted geometric mean is taken of the relatives. 

The adjusted index is computed diifferently, the steps being 
to (1) determine the secular trend of the individual series, 
(2) express the original items as’ percentages of the ordinates 
of secular trend; ^ and (3) take an arithmetic mean of these 
percentages. 

d. The Form and Place of Publication 

Both unadjusted and adjusted series for the period 1879- 
1920 are published by The Harvard Committee.^ 

(^) Index of Mining ^ 
a. Quantity Data 

The basic data for the most part are secured from the 
United States Geological Survey. 

b. Types of Commodities 

For the original index which covered the period 1879 to 
1919, the following commodities were included: gold, silver, 

^ See pp. 440, 446-447, where these terms are defined, and an illustrative 
example worked out. 

^ Log. ctt. A continuation of the unadjusted and adjusted indexes of 
agricultural production — certain modifications having been made from 
time to time — is contained in the Review of Economic StatisticSf as 
follows: Preliminary Volume IV, No. 3, July, 1922, covering the period 
1909 to 1921; Preliminary Volume V, No. 3, July, 1923, for the year 
1922; Preliminary Vol. VI, No. 3, July, 1924, for the year 1923. 

* See note 4, p. 531. 
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pig iron, copper, lead, zinc, anthracite coal, bituminous coal, 
petroleum, and coke. 

c. The Method of Calculating the Indexes 

Two indexes are computed: (1) an unadjusted, and (2) an 
adjusted index, the methods being identical with those used 
in securing the agricultural number.^ 

d. The Form and Place of Publication 

Both the unadjusted and adjusted indexes are published by 
the Harvard Committee on Economic Research,^ details being 
given for each of the commodities and for the group as a 
whole. 


(5) Index of Manufacture^ 
a. Quantity Data 

The quantity data are for thirty-three series covering the 
years 1899 to 1919, selection being based upon the availability 
of the data and their importance.^ 


b. Types and Grouping of Commodities 

The thirty-three series are divided in ten groups as follows: 
(1) food; (2) textiles; (3) iron and steel; (4) lumber; (6) 
liquors; (6) chemicals; (7) stone, glass, and clay products; 
(8) metals, non-ferrous; (9) tobacco; and (10) vehicles. 

^ See above, p. 532. 

^ Loo. cit., Sep., 1920 — ^Jan., 1921, pp. 15-27. Both types of indexes 
covering other years are continued in the Review of Economic Statis- 
tics ^ as follows : Preliminary Vol. IV, No. 3, July, 1922, covering the 
years 1909 to 1921 ; Preliminary Vol. V., No. 3, July, 1923, covering the 
year 1922; and in Preliminary Vol. VI, July, 1924, for the year 1923. 

® See note 4, p. 531. 

*An analysis of over 80 series for the Census years 1899, 1904, 1909, 
and 1914, in part supplied the basis for the selection of the 33 series used 
in the annual index. 
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c. The Method of Calculating the Indexes 

The steps in the calculation of the unadjusted index are as 
follows: (1) computing relatives for each of the thirty-three 
series for each year in terms of the corresponding items in 
the base year, 1909; (2) applying weights to the relatives in 
each series based upon census data for 1909 — the individual 
series, and the groups being separately weighted; (3) adjusting 
the group indexes so as to conform to those secured from a 
similar analysis of the census years; ^ and (4) computing a 
weighted geometric mean of the group indexes — ^the weights 
for the groups being the values added by manufacture as re- 
ported by the United States Census Bureau, 

The adjusted index is calculated as follows: (1) determine 
for each of the 33 series the line of secular trend by the least- 
square method, the period to which the line is fitted being 1899 
to 1913; (2) express the original items year by year as per- 
centages of trend, (3) apply weights as in step two for the 
unadjusted index, and (4) take a weighted arithmetic aver- 
age of the group indexes, the weights being the values added 
by manufacture, as in the unadjusted series. 

d. Form and Place of Publication 

Both adjusted and unadjusted indexes are published by the 
Harvard Committee on Economic Research, ^ the detail cover- 
ing the years 1899 to 1919. 

(4) Combined Index of Agriculture, Mining, and 
Manufacture ^ 

The quantity data, and the types and grouping of commodi- 

^ See loc, cit., pp. 51 and 54. 

^ Loo. cit., September, 1920 — January, 1921, pp. 44-63. Similar indexes 
for later periods are given in the following numbers of the Remew of 
Econormc Statistics: Preliminary Vol. V, No. 1, January, 1923, pp. 30-60, 
covering monthly and annual indexes, 1919 to 1922 ; Preliminary Vol. V., 
No. 3, July, 1923, pp. 205-211, Preliminary Vol. VI, No. 3, July, 1924, 
pp. 199-204. 

^See note 4, p. 531 
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ties are the same as those indicated above under the three 
separate indexes. The method of calculating a composite 
of the three is as follows: 

The combined unadjusted index is secured by calculating 
each year a weighted geometric mean of the three indexes, the 
weights for each index being the aggregate value of production 
in the respective fields during the census year 1909.’' 

The combined adjusted index is a weighted arithmetic mean 
of the three separate indexes, the weights being the same as 
in the unadjusted index.^ 


2. OTHER INDEXES OF PHYSICAL PRODUCTION 

(J) The Federal Reserve Board 

The Federal Reserve Board prepares and publishes each 
month “Indexes of Industrial Activity.” ^ 


(j0) The Department of Commerce 

The United States Department of Commerce in its monthly 
Survey of Current Business publishes the following indexes: 

a. “A Monthly Index of Manufacturing Production.” ® 

b. “A Monthly Index of Raw Material Production.” * 

c. “A Monthly Index of Mineral Production.” ® 

d. “A Monthly Index of Forestry Production.”® 

^ Loc, pp. 64-68. 

* These indexes were first presented, together with a description of 
data and methods, in the Federal Reserve Bulletin^ March, 1922. A 
revision was made in March, 1924, the method being described in Bulletin^ 
March, 1924, pp. 183-188. 

® See Survey of Current Business^ January, 1923, pp. 22-28, for a 
description of the contents of this index and the method by which it is 
calculated. 

*im., Sept., 1922, pp. 22-24. 

May, 1922, pp. 19-22. 

August, 1922, pp. 18-21. 
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IV. Indexes op Volume of Teade 

1, ^'PBESONS^ index OF THE HARVAKD COMMITTEE ON 

ECONOMIC RESEARCH 

‘‘An Index of Trade for the United States’^ ^ for the years 
1903-1923 is of a somewhat different type from those which 
have been termed production indexes, or price indexes designed 
to measure cyclical fluctuations in business.^ The object in 
this case is to so combine series, such as bank clearings outside 
New York, values of imports of merchandise, gross earnings 
of railroads, production of pig iron, and the relative number 
of wage earners employed in industrial establishments, that the 
resulting index will “be responsive to variations in the general 
physical volume of trade.” ^ The manner in which this is 
done is interesting but too detailed to be outlined in this place. 

2. “snyder’s” new index of the volume of trade ^ 

This index is a weighted composite of 56 different series of 
monthly data grouped into 28 major classes, covering, among 
other things, productive activity; primary distribution, such as 
car loadings, wholesale trade, exports and imports, etc.; dis- 
tribution to consumers, such as department store sales, chain 
store sales, mail order sales, etc.; general business activity, in- 
cluding shares sold on the New York stock exchange, new 
corporate financing, etc. 

All of these various series, comprising the “immensely 
greater part of the nation^s trade, probably 80 per cent and 
more,”® are combined into a single index in the belief that 

^Persons, W. M, Review of B commie Statistics^ Preliminary Vol. V, 
No. 2, April, 1023, pp. 71-78. 

* See the description of the Harvard Ten Commodity Index, supra, 
pp. 529-530. 

® Persons, W M., lac, ctt., p. 78. 

^ Fully described in the Journal of the American Statistical Assoom- 
tion, December, 1923, pp. 949-963. 

^im., p. 950. 
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together they represent an index of trade far better than does 
the production of basic commodities alone. The different 
series are reduced to a common denominator in terms of 
their normal growth, seasonal variation, where important, be- 
ing allowed for, and price changes eliminated. The index is 
computed as percentages of normal trend.^ 

3. OTHER TRADE INDEXES 
(1) The Federal Reserve Board 

a. ^^An Index of the Trend of Retail Trade.” ® 

b. “An Index of Wholesale Trade.” ® 

(^) The United States Department of Commerce 

a. “Monthly Index of Crop Marketings.”^ 

b. “Monthly Index of Marketing of Animal Products.” ® 

V. Indexes of General Business Conditions 

The foregoing indexes for the most part relate to specific 
phenomena, such as prices; production, including agriculture, 
mining, manufacturing; trade; marketing; etc. They are 
not designed to serve as barometers or as forecasters of busi- 
ness change through periods of depression, recovery, prosperity, 
financial strain and crisis. That is, they have to do not so 
much with defining and with timing the period of these shifts 
in business as they do with measuring on a relative basis the 
changes which take place. 

But there is another type of index which remains to be 

*For details see note 4, p. 536. 

*For ttie method used in constructing this index, see the Federal 
Reserve Bulletin, January, 1924, pp. 17-19. 

^lUd., April, 1923, pp. 439-442. 

'‘For the method used in computing this index, see the Survey/ of 
Current Business, July, 1922, pp. 17-21. 

June, 1922, pp. 18-21. 
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described* It has to do with a measurement of general busi- 
ness conditions, and with a forecast of what they are likely to 
be in the future. 

Certain aspects of the business cycle, so-called, were de- 
scribed in Chapter XIV as a background for the special treat- 
ment of time series. Something more, however, needs to be 
said about it. 

Business conditions are always in a state of flux: they are 
never ^^normaP^ in the sense of being stationary. But the 
changes through which they pass are not fortuitous or hap- 
hazard. This has been demonstrated beyond question of 
doubt. Neither are they perfectly regular and periodic. The 
ups and downs of business do have characteristic features, 
however, and probably do not vary more than a few per cent ^ 
from what may be termed normal. Business in general, and 
certain of its specific phenomena pass through well-defined 
major and minor movements, ilccordingly, it is possible to 
determine their order and the relations between them, to set 
up a measure of present conditions, and to give a forecast of 
what those in the immediate future are likely to be. This 
is what is done by the Harvard Committee on Economic Re- 
search, for instance, in its ^Tndex of General Business Condi- 
tions,” described below. 

1. THE HAKVARD INDEX OF GENERAL BUSINESS CONDITIONS 

In the December, 1916, number of the American Economic 
Review,^ Professor Warren M. Persons published a significant 
article. By the use of the correlation coefficient, he established 
the time fluctuations between a large number of series of 
business data and sorted out certain series which he called ^'a 
business barometer.” Certain other series he found had fore- 

^ Snyder claims not more than 5 per cent plus or minus from ‘‘nor- 
mal 

* Persons, W. M., “The Construction of a Business Barometer Based 
Upon Annual Bata,” American ^Economic Review, December, 1916, pp. 
7S9-769- 
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casting properties. With this contribution as a beginning, the 
Han''ard Committee on Economic Research now publishes 
weekly, as a part of its Economic Service, an “Index of Busi- 
ness Conditions.” 

Its barometer and forecaster are not based upon the theory 
that the cycles of business are perfectly periodic, nor upon 
the assumption that “for every action in business there is 
necessarily an equal and opposite reaction.” They are rather 
founded upon the results of an elaborate study of data through 
the period 1903 to 1914 which showed that there is a “sequence 
in movements in the speculative, business, and money markets 
which can be measured statistically, and shown graphically 
on an index chart,” ^ 

The chart covering the trial period, 1903 to 1914, is shown 
in Figure 88. 

An inspection of this chart shows the following important 
relations: (1) an interval of several months between the 
movements in the curves of speculation, of business and of 
money; and (2) the same order in the upward and downward 
movements and turning points of the curves. The movements 
are as follows: those in Curve “A” precede from six to ten 
months those in Curve “B”; those in Curve “B” precede from 
two to eight months those in Curve “C.” “It is the regularity 
in the sequence of the movements of the three curves which 
affords a logical basis for scientific business forecasting. Curve 
^A^ moves first, ^B^ second, ^C^ third — speculation, business, 
money.” ^ 

The interpretation of this index is based upon “(1) the 
direction of the movement of each curve in relation to the 
movements of the other curves; (2) the direction of the im- 
mediately preceding movement; (3) the magnitude of such 
movements.” ® 

Harvard Index of General Business Conditions — Its Interpre- 
tation/’ Harvard University Committee on Economic Research, Cam- 
bridge, Mass., 1923, p. 8. 

*Op. cit, p. 9. 

•Ihid., p. 13. 
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The index was published in the form shown in Figure 88 
until May 19, 1923. Since this date a similar chart — see 
Figure 89 — has’ been published currently in the Harvard 
Economic Service.^ 

Curve ^‘A” — speculation — is now based upon New York 
bank debits and industrial stock prices; Curve — busi- 

ness — upon outside (New York City) bank debits and com- 
modity prices; Curve ^^C” — ^money — ^upon interest rates on 
4-6 months good, and 4-6 months prime commercial paper. 
While the new curves are based upon somewhat different data 
from the old ones, they have the same function, and their move- 
ments are to be interpreted in the same way as before the 
change was made. 

2. THE BROOKMIRE FORECASTING COMPOSITE LINE ^ 

The forecasting line prepared by the Broohmire Economic 
Service is not designed to show the state of business, but 
rather to forecast stock and commodity prices It is made 
from a simple average of the following six series, all of which 
are treated for seasonal variation and some of them for 
secular trend: (1) the prices of 40 industrial and railroad 
stocks on the New York exchange, multiplied by the number of 
shares sold on this exchange; (2) a variety of series indicative 
of physical production; (3) the ratio of the value of imports 
to the value of exports; (4) the turnover of bank deposits; (5) 
interest rates on 4-6 months^ commercial paper; and (6) the 
open market rate for three months’ bills in London. 

Averages for current months are compared with those for 
1904-1913, the relative fluctuations being expressed in terms of 
the maximum. The amounts are plotted on semi-logarithmic 

^ See Persons, W. M , “The Revised Index of General Business Con- 
ditions,” The Review of Economic Statistics, July, 1923, pp. 187-195, 
for an account of the necessity for revision, and the method of accom- 
plishing it. 

* See Vance, Ray, Business and Investment Forecasting, The Brook- 
mire Economic Service, New York, 1922, for a description of the method 
of computing the forecasting line. 
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paper, which has the effect of toning down the extreme varia- 
tions. 

The direction in which the line is drawn from month to 
month depends upon the size of the average of the six factors 
as compared with that for 1904 to 1913. When the average 
is within a neutral zone of about 3 per cent above and below 
the base line, the change recorded by it is held to have no 
significance — ^the new point on the forecasting line being 
moved horizontally. When the average is out of the zone, it 
is held to be significant for forecasting purposes. If, however, 
within four months it crosses the neutral zone again, the 
whole movement is disregarded. 

The forecasting curve is held to anticipate by one month 
changes in stock prices, and by six to seven months, changes 
in commodity prices. 

3. OTHER BAROMETRIC AND FORECASTING INDEXES 

Space is not available in which to describe the following in- 
dexes: the ^^Compositplot’^ of the Babson Statistical Organiza- 
tion; ^ the ^^money,^^ the ^^stock price,” and the '^business” 
curves of the Standard Statistics Corporation; ^ nor the An- 
nalist’s ^^Barometer and Business Index Line.” ^ The reader, 
however, will find a study of the ^^services” of these and other 
organizations of interest. 

VI. Other Indexes of Business and Economic Phenomena 

Business and statistical literature are filled with ^^indexes” 
of various types. It is inadvisable, however, in this place to 
do more than mention some of those which are outstanding. 
This is done in bibliographical form below. 

^See Knauth, Oswald W., “Statistical Indexes of Business Conditions 
and Their Uses,” in Business Cycles and Unemployment, McGraw-Hill, 
New York, 1923, pp. 364-368. 

^ Explained in The Annalist, March 28 and October 24, 1921. 
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1. MONEY AND PRICES 

^^An Index Chart Based on Price and Money Rates.’’ ^ 
'Tndex of the General Price Level.” ^ 

^Tndex of Velocity of Bank Deposits.” ® 

^^A New Clearings Index of Business for Fifty Years.” ^ 
^International Price Indexes.”® 

2. EMPLOYMENT AND UNEMPLOYMENT 

'Tndex of Employment in Manufacturing Industries.” ® 

^‘An Index of the Labor Market.” ^ 

^^Employment and the Business Cycle.” ® 

“Fluctuations of Employment in Cities of the United States, 
1902—1917.” ^ 

^Persons, W. M., Review of Economic BtatisUcs^ January, 1922, 
pp 7-11. 

* Snyder, Carl, Journal of the American Statistical Association^ June, 
1924, pp. 189-195. 

® Described by Burgess, W. Randolpb, Journal of the American Sta- 
tistical Association, June, 1923, pp. 727-740 ; Federal Reserve Bulletin, 
May, 1923 ; compared with Snyder’s Volume of Trade Index by Snyder, 
Carl, New Index of Business Activity,” Journal of the American 
Statistical Association, March, 1924, pp. 36-41. 

^Snyder, Carl, Journal of the American Statistical Association, Sep- 
tember, 1924, pp. 329-335. 

® See Federal Reserve Bulletin, February, 1922, pp. 147-153; July, 
1922, pp. 801-806 ; August, 1922, pp. 922-929 ; September, 1922, pp. 
1052-1059. See also Snodgrass, Katharine, “A New Price Index tor 
Great Britain,” Journal of the American Statistical Association, June, 

1922, pp. 241-249. 

^Federal Reserve Bulletin. December, 1923, pp. 1272-1279. The 
method of preparing this index was planned, and its construction super- 
vised by Professor W. A. Bei ridge. See also Berridge, W. A., “Cycles 
of Unemployment in the United States,” Houghton Mifflin, Boston, 

1923, for an account of the uses of such an index. 

^ Federal Reserve Bulletin, February, 1924, pp. 83-87. This index was 
planned by Dr. Berridge, Brown University. 

® Berridge, W. A., Review of Economic Statistics. January, 1922, pp. 
12-51. Also similar articles by the same author in Journal of the Amer- 
ican Statistical Association, March, 1922, pp. 42-55; and June, 1922, 
pp, 227-240. 

®PIart, Homell, “Employment Fluctuations in the United States 1902- 
1917,” Studies of the Helen S. Trounstme Foundation, Cincinnati, 1918, 
Vol. I, pp. 47-59. 
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“An Index of Factory Employment in Illinois.^^ ^ 

“An Index of the Number of Applicants per One-hundred 
Positions Open at Illinois Free Employment Offices.’^ ^ 

3. INDEX OF FOREIGN EXCHANGE RATES ^ 

4. INDEXES OP DISTRIBUTION 

“Department Store Stocks/^ ^ 

“Department Store Sales.’’® 

5. INDEXES OF SECURITY PRICES ® 

“An Index of Industrial Stock Prices.” ^ 

“A Monthly Index of Bond Yields, 1919-1923.” ® 

6. INDEXES OF EARNINGS AND WAGE-RATES 

Index numbers of trends of hourly wage-rates, weekly wage- 
rates, and weekly hours, are published by the United States 
Bureau of Labor Statistics. The methods used are described 
by the Bureau as follows: 

“In computing the index numbers for a trade, the first step is 
to obtain the average rate for the trade, which is done by multiply- 

^ Method of construction described in Annual Report Illinois Depart- 
ment of Labor, 1923, Springfield, 111. Current data appear in The 
Labor Bulletin, issued monthly by Illinois Department of Labor, Chicago, 
111 

" lUd 

See Federal Reserve Bullehn, July, 1921, pp. 794-799; see also a 
criticism of this index by Davis, J. S., “Index Numbers of Foreign 
Exchange,” Quarterly Journal of Economics, May, 1922, pp. 535-542; and 
a reply by Goldenweiser, E. A, in Quarterly Journal of Economics, 
November, 1922, pp 191-195. 

* Federal Reserve Bulletin, March, 1924, pp. 189-190. 

Ibid., January, 1924, pp. 17-21. 

®For a comprehensive discussion of the problem of constructing index 
numbers of stock prices, see Mitchell, W. C., “A Critique of Index Num- 
bers of Prices of Stocks,” Journal of Political Economy, July, 1916, pp. 
625-693 

^ Frickey, Edwin, Review of Economic Statistics, August, 1921, pp. 
264-277. 

® Maxwell, F. W„ and Matthews, A. M., Ibid., July, 1923, pp. 212-217. 
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ing the rate per hour in each city by the number of union members 
in the city, addmg the products, and dividing by the aggregate num- 
ber of union members m the country entering into the total These 
averages are brought into comparison with the average for the base 
year to determine the index number for each year Grand average 
hourly rate, full-time weekly earnmgs, and weekly hours for all 
trades combined are obtained m the same manner as the correspond- 
ing figures were obtained for each of the several trades.” ^ 

‘^Course of Average Weekly Earnings in New York State 
Factories — An Index.’^^ 

'Tndex Numbers for the Wages of Common Labor.^^ ® 

VII. Conclusion 

It is hoped that this chapter is more than informative. To 
know even in detail the methods which are used in computing 
different index numbers is of little importance if in their use 
the principles underlying the methods are ignored or forgotten. 

The need for index numbers of various kinds, constructed 
according to different patterns helps partly but not wholly 
to explain the variety of types available. Too frequently, in 
the past, methods were followed because they were simple 
rather than because they were appropriate. So long as in- 
formation was lacking as to the relations between methods and 
results, there was some justification for this condition. But 
that time has passed. There is no excuse to-day for the mis- 
taken belief that all index numbers are equally good, and that 
from those available relating to prices, trade, unemployment, 
etc., selection may be made at random in order to measure 
business, social, and industrial changes. 

^ “Methods of Procuring and Computing Statistical Information of the 
Bureau of Labor Statistics,” Bulletin 328, United States Bureau of 
Labor Statistics^ Washington, D. C., March, 1923, p. 3. Current data 
appear in the Monthly Labor Review for each year, and are cumulated 
in the report called Union Scale of Wages and Hours of Labor, 

® Published monthly in The Industrial Bulletin, Industrial Commission 
of New York State, Albany, New York. 

® Burgess, W. Randolph, Journal of the American Statistical Assoda- 
iion, March, 1922, pp. 101-103. 



* From Logarithmic and Trigonometric Tablee^ Eevised Edition, edited 
by E. R. Hedrick. Copyright, 1920, by The Macmillan Company. Re- 
printed by permission of the editor and publishers. 



